Python client library¶

Once you have installed python-zyte-api and configured your API key, you can use one of its APIs from Python code:

The sync API can be used to build simple, proof-of-concept or debugging Python scripts.
The async API can be used from coroutines, and is meant for production usage, as well as for asyncio environments like Jupyter notebooks.

Sync API¶

Create a ZyteAPI object, and use its get() method to perform a single request:

from zyte_api import ZyteAPI

client = ZyteAPI()
result = client.get({"url": "https://toscrape.com", "httpResponseBody": True})

To perform multiple requests, use a session() for better performance, and use iter() to send multiple requests in parallel:

from zyte_api import ZyteAPI, RequestError

client = ZyteAPI()
with client.session() as session:
    queries = [
        {"url": "https://toscrape.com", "httpResponseBody": True},
        {"url": "https://books.toscrape.com", "httpResponseBody": True},
    ]
    for result_or_exception in session.iter(queries):
        if isinstance(result_or_exception, dict):
            ...
        elif isinstance(result_or_exception, RequestError):
            ...
        else:
            assert isinstance(result_or_exception, Exception)
            ...

Tip

iter() yields results as they come, not necessarily in their original order. Use echoData to track the source request.

Async API¶

Create an AsyncZyteAPI object, and use its get() method to perform a single request:

import asyncio

from zyte_api import AsyncZyteAPI

async def main():
    client = AsyncZyteAPI()
    result = await client.get({"url": "https://toscrape.com", "httpResponseBody": True})

asyncio.run(main())

To perform multiple requests, use a session() for better performance, and use iter() to send multiple requests in parallel:

import asyncio

from zyte_api import ZyteAPI, RequestError


async def main():
    client = ZyteAPI()
    async with client.session() as session:
        queries = [
            {"url": "https://toscrape.com", "httpResponseBody": True},
            {"url": "https://books.toscrape.com", "httpResponseBody": True},
        ]
        for future in session.iter(queries):
            try:
                result = await future
            except RequestError as e:
                ...
            except Exception as e:
                ...


asyncio.run(main())

Tip

iter() yields results as they come, not necessarily in their original order. Use echoData to track the source request.

Optimization¶

ZyteAPI and AsyncZyteAPI use 15 concurrent connections by default.

To change that, use the n_conn parameter when creating your client object:

client = ZyteAPI(n_conn=30)

The number of concurrent connections if enforced across all method calls, including different sessions of the same client.

For guidelines on how to choose the optimal value for you, and other optimization tips, see Optimizing Zyte API usage.

Errors and retries¶

Methods of ZyteAPI and AsyncZyteAPI automatically handle retries for rate-limiting and unsuccessful responses, as well as network errors.

The default retry policy, zyte_api_retrying, does the following:

Retries rate-limiting responses forever.
Retries unsuccessful responses up to 3 times.
Retries network errors for up to 15 minutes.

All retries are done with an exponential backoff algorithm.

To customize the retry policy, create your own AsyncRetrying object, e.g. using a custom subclass of RetryFactory, and pass it when creating your client object:

client = ZyteAPI(retrying=custom_retry_policy)

When retries are exceeded for a given request, an exception is raised. Except for the iter() method of the sync API, which yields exceptions instead of raising them, to prevent exceptions from interrupting the entire iteration.

The type of exception depends on the issue that caused the final request attempt to fail. Unsuccessful responses trigger a RequestError and network errors trigger aiohttp exceptions. Other exceptions could be raised; for example, from a custom retry policy.