API reference
Sync API
- class ZyteAPI(*, api_key: str | None = None, api_url: str | None = None, n_conn: int = 15, retrying: AsyncRetrying | None = None, user_agent: str | None = None, eth_key: str | None = None, trust_env: bool = False)[source]
-
api_key is your Zyte API key. If not specified, it is read from the
ZYTE_API_KEYenvironment variable. See API key.Alternatively, you can set an Ethereum private key through eth_key to use Ethereum for payments. If not specified, it is read from the
ZYTE_API_ETH_KEYenvironment variable. See x402.api_url is the Zyte API base URL. If set to
None, it defaults to"https://api.zyte.com/v1/". If using an Ethereum private key, e.g. through eth_key or through theZYTE_API_ETH_KEYenvironment variable,Noneresults in"https://api-x402.zyte.com/v1/"instead.n_conn is the maximum number of concurrent requests to use. See Optimization.
retrying is the retry policy for requests. Defaults to
zyte_api_retrying.user_agent is the user agent string reported to Zyte API. Defaults to
python-zyte-api/<VERSION>.trust_env controls whether
aiohttphonors environment-based network settings (e.g.HTTP_PROXYandHTTPS_PROXY). Defaults toFalse.Tip
To change the
User-Agentheader sent to a target website, use customHttpRequestHeaders instead.- get(query: dict[str, Any], *, endpoint: str = 'extract', session: ClientSession | None = None, handle_retries: bool = True, retrying: AsyncRetrying | None = None) dict[str, Any][source]
Send query to Zyte API and return the result.
endpoint is the Zyte API endpoint path relative to the client object api_url.
session is the network session to use. Consider using
session()instead of this parameter.handle_retries determines whether or not a retry policy should be used.
retrying is the retry policy to use, provided handle_retries is
True. If not specified, the default retry policy is used.
- iter(queries: list[dict[str, Any]], *, endpoint: str = 'extract', session: ClientSession | None = None, handle_retries: bool = True, retrying: AsyncRetrying | None = None) Generator[dict[str, Any] | Exception, None, None][source]
Send multiple queries to Zyte API in parallel and iterate over their results as they come.
The number of queries can exceed the n_conn parameter set on the client object. Extra queries will be queued, there will be only up to n_conn requests being processed in parallel at a time.
Results may come an a different order from the original list of queries. You can use echoData to attach metadata to queries, and later use that metadata to restore their original order.
When exceptions occur, they are yielded, not raised.
The remaining parameters work the same as in
get().
- session(**kwargs: Any) _Session[source]
Context manager to create a session.
A session is an object that has the same API as the client object, except:
get()anditer()do not have a session parameter, the session creates anaiohttp.ClientSessionobject and passes it toget()anditer()automatically.It does not have a
session()method.
Using the same
aiohttp.ClientSessionobject for all Zyte API requests improves performance by keeping a pool of reusable connections to Zyte API.The
aiohttp.ClientSessionobject is created with sane defaults for Zyte API, but you can use kwargs to pass additional parameters toaiohttp.ClientSessionand even override those sane defaults.You do not need to use
session()as a context manager as long as you callclose()on the object it returns when you are done:session = client.session() try: ... finally: session.close()
Async API
- class AsyncZyteAPI(*, api_key: str | None = None, api_url: str | None = None, n_conn: int = 15, retrying: AsyncRetrying | None = None, user_agent: str | None = None, eth_key: str | None = None, trust_env: bool = False)[source]
-
Parameters work the same as for
ZyteAPI.- async get(query: dict[str, Any], *, endpoint: str = 'extract', session: ClientSession | None = None, handle_retries: bool = True, retrying: AsyncRetrying | None = None) dict[str, Any][source]
Asynchronous equivalent to
ZyteAPI.get().
- iter(queries: list[dict[str, Any]], *, endpoint: str = 'extract', session: aiohttp.ClientSession | None = None, handle_retries: bool = True, retrying: AsyncRetrying | None = None) Iterator[_ResponseFuture][source]
Asynchronous equivalent to
ZyteAPI.iter().Note
Yielded futures, when awaited, do raise their exceptions, instead of only returning them.
- session(**kwargs: Any) _AsyncSession[source]
Asynchronous equivalent to
ZyteAPI.session().You do not need to use
session()as an async context manager as long as you awaitclose()on the object it returns when you are done:session = client.session() try: ... finally: await session.close()
Retries
- zyte_api_retrying
- aggressive_retrying
- class RetryFactory[source]
Factory class that builds the
tenacity.AsyncRetryingobject that defines the default retry policy.To create a custom retry policy, you can subclass this factory class, modify it as needed, and then call
build()on your subclass to get the correspondingtenacity.AsyncRetryingobject.For example, to double the number of attempts for download errors and the time network errors are retried:
from zyte_api import ( RetryFactory, stop_after_uninterrupted_delay, stop_on_download_error, ) class CustomRetryFactory(RetryFactory): network_error_stop = stop_after_uninterrupted_delay(30 * 60) download_error_stop = stop_on_download_error(max_total=8, max_permanent=4) CUSTOM_RETRY_POLICY = CustomRetryFactory().build()
- class AggressiveRetryFactory[source]
Factory class that builds the
tenacity.AsyncRetryingobject that defines the aggressive retry policy.To create a custom retry policy, you can subclass this factory class, modify it as needed, and then call
build()on your subclass to get the correspondingtenacity.AsyncRetryingobject.For example, to double the maximum number of attempts for all error responses and double the time network errors are retried:
from zyte_api import ( AggressiveRetryFactory, stop_after_uninterrupted_delay, stop_on_count, stop_on_download_error, ) class CustomRetryFactory(AggressiveRetryFactory): download_error_stop = stop_on_download_error(max_total=16, max_permanent=8) network_error_stop = stop_after_uninterrupted_delay(30 * 60) undocumented_error_stop = stop_on_count(8) CUSTOM_RETRY_POLICY = CustomRetryFactory().build()
Errors
- exception RequestError(*args: Any, **kwargs: Any)[source]
Exception raised upon receiving a rate-limiting or unsuccessful response from Zyte API.
- property parsed: ParsedError
Response as a
ParsedErrorobject.
- class ParsedError(response_body: bytes, data: dict | None, parse_error: str | None)[source]
Parsed error response body from Zyte API.
- data: dict | None
JSON-decoded response body.
If
None,parse_errorindicates the reason.
- classmethod from_body(response_body: bytes) ParsedError[source]
Return a
ParsedErrorobject built out of the specified error response body.
- parse_error: str | None
If
dataisNone, this indicates whether the reason is thatresponse_bodyis not valid JSON ("bad_json") or that it is not a JSON object ("bad_format").