API reference¶
Sync API¶
- class ZyteAPI(*, api_key=None, api_url='https://api.zyte.com/v1/', n_conn=15, retrying: AsyncRetrying | None = None, user_agent: str | None = None)[source]¶
-
api_key is your Zyte API key. If not specified, it is read from the
ZYTE_API_KEY
environment variable. See API key.api_url is the Zyte API base URL.
n_conn is the maximum number of concurrent requests to use. See Optimization.
retrying is the retry policy for requests. Defaults to
zyte_api_retrying
.user_agent is the user agent string reported to Zyte API. Defaults to
python-zyte-api/<VERSION>
.Tip
To change the
User-Agent
header sent to a target website, use customHttpRequestHeaders instead.- get(query: dict, *, endpoint: str = 'extract', session: ClientSession | None = None, handle_retries: bool = True, retrying: AsyncRetrying | None = None) dict [source]¶
Send query to Zyte API and return the result.
endpoint is the Zyte API endpoint path relative to the client object api_url.
session is the network session to use. Consider using
session()
instead of this parameter.handle_retries determines whether or not a retry policy should be used.
retrying is the retry policy to use, provided handle_retries is
True
. If not specified, the default retry policy is used.
- iter(queries: List[dict], *, endpoint: str = 'extract', session: ClientSession | None = None, handle_retries: bool = True, retrying: AsyncRetrying | None = None) Generator[dict | Exception, None, None] [source]¶
Send multiple queries to Zyte API in parallel and iterate over their results as they come.
The number of queries can exceed the n_conn parameter set on the client object. Extra queries will be queued, there will be only up to n_conn requests being processed in parallel at a time.
Results may come an a different order from the original list of queries. You can use echoData to attach metadata to queries, and later use that metadata to restore their original order.
When exceptions occur, they are yielded, not raised.
The remaining parameters work the same as in
get()
.
- session(**kwargs)[source]¶
Context manager to create a session.
A session is an object that has the same API as the client object, except:
get()
anditer()
do not have a session parameter, the session creates anaiohttp.ClientSession
object and passes it toget()
anditer()
automatically.It does not have a
session()
method.
Using the same
aiohttp.ClientSession
object for all Zyte API requests improves performance by keeping a pool of reusable connections to Zyte API.The
aiohttp.ClientSession
object is created with sane defaults for Zyte API, but you can use kwargs to pass additional parameters toaiohttp.ClientSession
and even override those sane defaults.You do not need to use
session()
as a context manager as long as you callclose()
on the object it returns when you are done:session = client.session() try: ... finally: session.close()
Async API¶
- class AsyncZyteAPI(*, api_key=None, api_url='https://api.zyte.com/v1/', n_conn=15, retrying: AsyncRetrying | None = None, user_agent: str | None = None)[source]¶
-
Parameters work the same as for
ZyteAPI
.- async get(query: dict, *, endpoint: str = 'extract', session=None, handle_retries=True, retrying: AsyncRetrying | None = None) Future [source]¶
Asynchronous equivalent to
ZyteAPI.get()
.
- iter(queries: List[dict], *, endpoint: str = 'extract', session: ClientSession | None = None, handle_retries=True, retrying: AsyncRetrying | None = None) Iterator[Future] [source]¶
Asynchronous equivalent to
ZyteAPI.iter()
.Note
Yielded futures, when awaited, do raise their exceptions, instead of only returning them.
- session(**kwargs)[source]¶
Asynchronous equivalent to
ZyteAPI.session()
.You do not need to use
session()
as an async context manager as long as you awaitclose()
on the object it returns when you are done:session = client.session() try: ... finally: await session.close()
Retries¶
- zyte_api_retrying¶
- class RetryFactory[source]¶
Factory class that builds the
tenacity.AsyncRetrying
object that defines the default retry policy.To create a custom retry policy, you can subclass this factory class, modify it as needed, and then call
build()
on your subclass to get the correspondingtenacity.AsyncRetrying
object.For example, to increase the maximum number of attempts for temporary download errors from 4 (i.e. 3 retries) to 10 (i.e. 9 retries):
from tenacity import stop_after_attempt from zyte_api import RetryFactory class CustomRetryFactory(RetryFactory): temporary_download_error_stop = stop_after_attempt(10) CUSTOM_RETRY_POLICY = CustomRetryFactory().build()
To retry permanent download errors, treating them the same as temporary download errors:
from tenacity import RetryCallState, retry_if_exception, stop_after_attempt from zyte_api import RequestError, RetryFactory def is_permanent_download_error(exc: BaseException) -> bool: return isinstance(exc, RequestError) and exc.status == 521 class CustomRetryFactory(RetryFactory): retry_condition = RetryFactory.retry_condition | retry_if_exception( is_permanent_download_error ) def wait(self, retry_state: RetryCallState) -> float: if is_permanent_download_error(retry_state.outcome.exception()): return self.temporary_download_error_wait(retry_state=retry_state) return super().wait(retry_state) def stop(self, retry_state: RetryCallState) -> bool: if is_permanent_download_error(retry_state.outcome.exception()): return self.temporary_download_error_stop(retry_state) return super().stop(retry_state) CUSTOM_RETRY_POLICY = CustomRetryFactory().build()
Errors¶
- exception RequestError(*args, **kwargs)[source]¶
Exception raised upon receiving a rate-limiting or unsuccessful response from Zyte API.
- property parsed¶
Response as a
ParsedError
object.
- class ParsedError(response_body: bytes, data: dict | None, parse_error: str | None)[source]¶
Parsed error response body from Zyte API.
- data: dict | None¶
JSON-decoded response body.
If
None
,parse_error
indicates the reason.
- classmethod from_body(response_body: bytes) ParsedError [source]¶
Return a
ParsedError
object built out of the specified error response body.
- parse_error: str | None¶
If
data
isNone
, this indicates whether the reason is thatresponse_body
is not valid JSON ("bad_json"
) or that it is not a JSON object ("bad_format"
).