API reference
Sync API
- class ZyteAPI(*, api_key=None, api_url='https://api.zyte.com/v1/', n_conn=15, retrying: AsyncRetrying | None = None, user_agent: str | None = None)[source]
-
api_key is your Zyte API key. If not specified, it is read from the
ZYTE_API_KEY
environment variable. See API key.api_url is the Zyte API base URL.
n_conn is the maximum number of concurrent requests to use. See Optimization.
retrying is the retry policy for requests. Defaults to
zyte_api_retrying
.user_agent is the user agent string reported to Zyte API. Defaults to
python-zyte-api/<VERSION>
.Tip
To change the
User-Agent
header sent to a target website, use customHttpRequestHeaders instead.- get(query: dict, *, endpoint: str = 'extract', session: ClientSession | None = None, handle_retries: bool = True, retrying: AsyncRetrying | None = None) dict [source]
Send query to Zyte API and return the result.
endpoint is the Zyte API endpoint path relative to the client object api_url.
session is the network session to use. Consider using
session()
instead of this parameter.handle_retries determines whether or not a retry policy should be used.
retrying is the retry policy to use, provided handle_retries is
True
. If not specified, the default retry policy is used.
- iter(queries: list[dict], *, endpoint: str = 'extract', session: ClientSession | None = None, handle_retries: bool = True, retrying: AsyncRetrying | None = None) Generator[dict | Exception, None, None] [source]
Send multiple queries to Zyte API in parallel and iterate over their results as they come.
The number of queries can exceed the n_conn parameter set on the client object. Extra queries will be queued, there will be only up to n_conn requests being processed in parallel at a time.
Results may come an a different order from the original list of queries. You can use echoData to attach metadata to queries, and later use that metadata to restore their original order.
When exceptions occur, they are yielded, not raised.
The remaining parameters work the same as in
get()
.
- session(**kwargs)[source]
Context manager to create a session.
A session is an object that has the same API as the client object, except:
get()
anditer()
do not have a session parameter, the session creates anaiohttp.ClientSession
object and passes it toget()
anditer()
automatically.It does not have a
session()
method.
Using the same
aiohttp.ClientSession
object for all Zyte API requests improves performance by keeping a pool of reusable connections to Zyte API.The
aiohttp.ClientSession
object is created with sane defaults for Zyte API, but you can use kwargs to pass additional parameters toaiohttp.ClientSession
and even override those sane defaults.You do not need to use
session()
as a context manager as long as you callclose()
on the object it returns when you are done:session = client.session() try: ... finally: session.close()
Async API
- class AsyncZyteAPI(*, api_key=None, api_url='https://api.zyte.com/v1/', n_conn=15, retrying: AsyncRetrying | None = None, user_agent: str | None = None)[source]
-
Parameters work the same as for
ZyteAPI
.- async get(query: dict, *, endpoint: str = 'extract', session=None, handle_retries=True, retrying: AsyncRetrying | None = None) _ResponseFuture [source]
Asynchronous equivalent to
ZyteAPI.get()
.
- iter(queries: list[dict], *, endpoint: str = 'extract', session: aiohttp.ClientSession | None = None, handle_retries=True, retrying: AsyncRetrying | None = None) Iterator[_ResponseFuture] [source]
Asynchronous equivalent to
ZyteAPI.iter()
.Note
Yielded futures, when awaited, do raise their exceptions, instead of only returning them.
- session(**kwargs)[source]
Asynchronous equivalent to
ZyteAPI.session()
.You do not need to use
session()
as an async context manager as long as you awaitclose()
on the object it returns when you are done:session = client.session() try: ... finally: await session.close()
Retries
- zyte_api_retrying
- aggressive_retrying
- class RetryFactory[source]
Factory class that builds the
tenacity.AsyncRetrying
object that defines the default retry policy.To create a custom retry policy, you can subclass this factory class, modify it as needed, and then call
build()
on your subclass to get the correspondingtenacity.AsyncRetrying
object.For example, to double the number of attempts for download errors and the time network errors are retried:
from zyte_api import ( RetryFactory, stop_after_uninterrupted_delay, stop_on_download_error, ) class CustomRetryFactory(RetryFactory): network_error_stop = stop_after_uninterrupted_delay(30 * 60) download_error_stop = stop_on_download_error(max_total=8, max_permanent=4) CUSTOM_RETRY_POLICY = CustomRetryFactory().build()
- class AggressiveRetryFactory[source]
Factory class that builds the
tenacity.AsyncRetrying
object that defines the aggressive retry policy.To create a custom retry policy, you can subclass this factory class, modify it as needed, and then call
build()
on your subclass to get the correspondingtenacity.AsyncRetrying
object.For example, to double the maximum number of attempts for all error responses and double the time network errors are retried:
from zyte_api import ( AggressiveRetryFactory, stop_after_uninterrupted_delay, stop_on_count, stop_on_download_error, ) class CustomRetryFactory(AggressiveRetryFactory): download_error_stop = stop_on_download_error(max_total=16, max_permanent=8) network_error_stop = stop_after_uninterrupted_delay(30 * 60) undocumented_error_stop = stop_on_count(8) CUSTOM_RETRY_POLICY = CustomRetryFactory().build()
Errors
- exception RequestError(*args, **kwargs)[source]
Exception raised upon receiving a rate-limiting or unsuccessful response from Zyte API.
- property parsed
Response as a
ParsedError
object.
- class ParsedError(response_body: bytes, data: dict | None, parse_error: str | None)[source]
Parsed error response body from Zyte API.
- data: dict | None
JSON-decoded response body.
If
None
,parse_error
indicates the reason.
- classmethod from_body(response_body: bytes) ParsedError [source]
Return a
ParsedError
object built out of the specified error response body.
- parse_error: str | None
If
data
isNone
, this indicates whether the reason is thatresponse_body
is not valid JSON ("bad_json"
) or that it is not a JSON object ("bad_format"
).