Python client library¶
Once you have installed python-zyte-api and configured your API key, you can use one of its APIs from Python code:
The sync API can be used to build simple, proof-of-concept or debugging Python scripts.
The async API can be used from coroutines, and is meant for production usage, as well as for asyncio environments like Jupyter notebooks.
Sync API¶
Create a ZyteAPI
object, and use its
get()
method to perform a single request:
from zyte_api import ZyteAPI
client = ZyteAPI()
result = client.get({"url": "https://toscrape.com", "httpResponseBody": True})
To perform multiple requests, use a session()
for
better performance, and use iter()
to send multiple
requests in parallel:
from zyte_api import ZyteAPI, RequestError
client = ZyteAPI()
with client.session() as session:
queries = [
{"url": "https://toscrape.com", "httpResponseBody": True},
{"url": "https://books.toscrape.com", "httpResponseBody": True},
]
for result_or_exception in session.iter(queries):
if isinstance(result_or_exception, dict):
...
elif isinstance(result_or_exception, RequestError):
...
else:
assert isinstance(result_or_exception, Exception)
...
Async API¶
Create an AsyncZyteAPI
object, and use its
get()
method to perform a single request:
import asyncio
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
result = await client.get({"url": "https://toscrape.com", "httpResponseBody": True})
asyncio.run(main())
To perform multiple requests, use a session()
for
better performance, and use iter()
to send
multiple requests in parallel:
import asyncio
from zyte_api import ZyteAPI, RequestError
async def main():
client = ZyteAPI()
async with client.session() as session:
queries = [
{"url": "https://toscrape.com", "httpResponseBody": True},
{"url": "https://books.toscrape.com", "httpResponseBody": True},
]
for future in session.iter(queries):
try:
result = await future
except RequestError as e:
...
except Exception as e:
...
asyncio.run(main())
Optimization¶
ZyteAPI
and AsyncZyteAPI
use 15
concurrent connections by default.
To change that, use the n_conn
parameter when creating your client object:
client = ZyteAPI(n_conn=30)
The number of concurrent connections if enforced across all method calls, including different sessions of the same client.
For guidelines on how to choose the optimal value for you, and other optimization tips, see Optimizing Zyte API usage.
Errors and retries¶
Methods of ZyteAPI
and AsyncZyteAPI
automatically handle
retries for rate-limiting and unsuccessful responses, as well as network errors.
The default retry policy, zyte_api_retrying
, does the
following:
Retries rate-limiting responses forever.
Retries temporary download errors up to 3 times.
Retries network errors until they have happened for 15 minutes straight.
All retries are done with an exponential backoff algorithm.
If some unsuccessful responses exceed
maximum retries with the default retry policy, try using
aggressive_retrying
instead, which modifies the default retry
policy as follows:
Temporary download error are retried 7 times. Permanent download errors also count towards this retry limit.
Retries permanent download errors up to 3 times.
Retries error responses with an HTTP status code in the 500-599 range (503, 520 and 521 excluded) up to 3 times.
Alternatively, the reference documentation of RetryFactory
and AggressiveRetryFactory
features some examples of custom
retry policies, and you can always build your own
AsyncRetrying
object from scratch.
To use aggressive_retrying
or a custom retry policy, pass an
instance of your AsyncRetrying
subclass when creating your
client object:
from zyte_api import ZyteAPI, aggressive_retrying
client = ZyteAPI(retrying=aggressive_retrying)
When retries are exceeded for a given request, an exception is raised. Except
for the iter()
method of the sync API, which
yields exceptions instead of raising them, to prevent exceptions from
interrupting the entire iteration.
The type of exception depends on the issue that caused the final request
attempt to fail. Unsuccessful responses trigger a RequestError
and
network errors trigger aiohttp exceptions.
Other exceptions could be raised; for example, from a custom retry policy.
See also