API reference

Sync API

class ZyteAPI(*, api_key=None, api_url='https://api.zyte.com/v1/', n_conn=15, retrying: AsyncRetrying | None = None, user_agent: str | None = None)[source]

Synchronous Zyte API client.

api_key is your Zyte API key. If not specified, it is read from the ZYTE_API_KEY environment variable. See API key.

api_url is the Zyte API base URL.

n_conn is the maximum number of concurrent requests to use. See Optimization.

retrying is the retry policy for requests. Defaults to zyte_api_retrying.

user_agent is the user agent string reported to Zyte API. Defaults to python-zyte-api/<VERSION>.

Tip

To change the User-Agent header sent to a target website, use customHttpRequestHeaders instead.

get(query: dict, *, endpoint: str = 'extract', session: ClientSession | None = None, handle_retries: bool = True, retrying: AsyncRetrying | None = None) dict[source]

Send query to Zyte API and return the result.

endpoint is the Zyte API endpoint path relative to the client object api_url.

session is the network session to use. Consider using session() instead of this parameter.

handle_retries determines whether or not a retry policy should be used.

retrying is the retry policy to use, provided handle_retries is True. If not specified, the default retry policy is used.

iter(queries: List[dict], *, endpoint: str = 'extract', session: ClientSession | None = None, handle_retries: bool = True, retrying: AsyncRetrying | None = None) Generator[dict | Exception, None, None][source]

Send multiple queries to Zyte API in parallel and iterate over their results as they come.

The number of queries can exceed the n_conn parameter set on the client object. Extra queries will be queued, there will be only up to n_conn requests being processed in parallel at a time.

Results may come an a different order from the original list of queries. You can use echoData to attach metadata to queries, and later use that metadata to restore their original order.

When exceptions occur, they are yielded, not raised.

The remaining parameters work the same as in get().

session(**kwargs)[source]

Context manager to create a session.

A session is an object that has the same API as the client object, except:

Using the same aiohttp.ClientSession object for all Zyte API requests improves performance by keeping a pool of reusable connections to Zyte API.

The aiohttp.ClientSession object is created with sane defaults for Zyte API, but you can use kwargs to pass additional parameters to aiohttp.ClientSession and even override those sane defaults.

You do not need to use session() as a context manager as long as you call close() on the object it returns when you are done:

session = client.session()
try:
    ...
finally:
    session.close()

Async API

class AsyncZyteAPI(*, api_key=None, api_url='https://api.zyte.com/v1/', n_conn=15, retrying: AsyncRetrying | None = None, user_agent: str | None = None)[source]

Asynchronous Zyte API client.

Parameters work the same as for ZyteAPI.

async get(query: dict, *, endpoint: str = 'extract', session=None, handle_retries=True, retrying: AsyncRetrying | None = None) Future[source]

Asynchronous equivalent to ZyteAPI.get().

iter(queries: List[dict], *, endpoint: str = 'extract', session: ClientSession | None = None, handle_retries=True, retrying: AsyncRetrying | None = None) Iterator[Future][source]

Asynchronous equivalent to ZyteAPI.iter().

Note

Yielded futures, when awaited, do raise their exceptions, instead of only returning them.

session(**kwargs)[source]

Asynchronous equivalent to ZyteAPI.session().

You do not need to use session() as an async context manager as long as you await close() on the object it returns when you are done:

session = client.session()
try:
    ...
finally:
    await session.close()

Retries

zyte_api_retrying

Default retry policy.

class RetryFactory[source]

Factory class that builds the tenacity.AsyncRetrying object that defines the default retry policy.

To create a custom retry policy, you can subclass this factory class, modify it as needed, and then call build() on your subclass to get the corresponding tenacity.AsyncRetrying object.

For example, to increase the maximum number of attempts for temporary download errors from 4 (i.e. 3 retries) to 10 (i.e. 9 retries):

from tenacity import stop_after_attempt
from zyte_api import RetryFactory


class CustomRetryFactory(RetryFactory):
    temporary_download_error_stop = stop_after_attempt(10)


CUSTOM_RETRY_POLICY = CustomRetryFactory().build()

To retry permanent download errors, treating them the same as temporary download errors:

from tenacity import RetryCallState, retry_if_exception, stop_after_attempt
from zyte_api import RequestError, RetryFactory


def is_permanent_download_error(exc: BaseException) -> bool:
    return isinstance(exc, RequestError) and exc.status == 521


class CustomRetryFactory(RetryFactory):

    retry_condition = RetryFactory.retry_condition | retry_if_exception(
        is_permanent_download_error
    )

    def wait(self, retry_state: RetryCallState) -> float:
        if is_permanent_download_error(retry_state.outcome.exception()):
            return self.temporary_download_error_wait(retry_state=retry_state)
        return super().wait(retry_state)

    def stop(self, retry_state: RetryCallState) -> bool:
        if is_permanent_download_error(retry_state.outcome.exception()):
            return self.temporary_download_error_stop(retry_state)
        return super().stop(retry_state)


CUSTOM_RETRY_POLICY = CustomRetryFactory().build()

Errors

exception RequestError(*args, **kwargs)[source]

Exception raised upon receiving a rate-limiting or unsuccessful response from Zyte API.

property parsed

Response as a ParsedError object.

request_id: str | None

Request ID.

response_content: bytes | None

Response body.

class ParsedError(response_body: bytes, data: dict | None, parse_error: str | None)[source]

Parsed error response body from Zyte API.

data: dict | None

JSON-decoded response body.

If None, parse_error indicates the reason.

classmethod from_body(response_body: bytes) ParsedError[source]

Return a ParsedError object built out of the specified error response body.

parse_error: str | None

If data is None, this indicates whether the reason is that response_body is not valid JSON ("bad_json") or that it is not a JSON object ("bad_format").

response_body: bytes

Raw response body from Zyte API.

property type: str | None

ID of the error type, e.g. "/limits/over-user-limit" or "/download/temporary-error".