asyncio APIΒΆ

Create an instance of the AsyncClient to use the asyncio client API. You can use the method request_raw to perform individual requests:

import asyncio
from zyte_api.aio.client import AsyncClient

client = AsyncClient()

async def single_request(url):
    return await client.request_raw({
        'url': url,
        'browserHtml': True
    })

response = asyncio.run(single_request("https://books.toscrape.com"))
# Do something with the response ..

There is also request_parallel_as_completed method, which allows to process many URLs in parallel, using multiple connections:

import asyncio
import json
import sys

from zyte_api.aio.client import AsyncClient, create_session
from zyte_api.aio.errors import RequestError

async def extract_from(urls, n_conn):
    client = AsyncClient(n_conn=n_conn)
    requests = [
        {"url": url, "browserHtml": True}
        for url in urls
    ]
    async with create_session(n_conn) as session:
        res_iter = client.request_parallel_as_completed(requests, session=session)
        for fut in res_iter:
            try:
                res = await fut
                # do something with a result, e.g.
                print(json.dumps(res))
            except RequestError as e:
                print(e, file=sys.stderr)
                raise

urls = ["https://toscrape.com", "https://books.toscrape.com"]
asyncio.run(extract_from(urls, n_conn=15))

request_parallel_as_completed is modelled after asyncio.as_completed (see https://docs.python.org/3/library/asyncio-task.html#asyncio.as_completed), and actually uses it under the hood.

request_parallel_as_completed and request_raw methods handle throttling (http 429 errors) and network errors, retrying a request in these cases.

CLI interface implementation (zyte_api/__main__.py) can serve as an usage example.