Command-line client

Once you have installed python-zyte-api and configured your API key, you can use the zyte-api command-line client.

To use zyte-api, pass an input file as the first parameter and specify an output file with --output. For example:

zyte-api urls.txt --output result.jsonl

Input file

The input file can be either of the following:

  • A plain-text file with a list of target URLs, one per line. For example:

    https://books.toscrape.com
    https://quotes.toscrape.com
    

    For each URL, a Zyte API request will be sent with browserHtml set to True.

  • A JSON Lines file with a object of Zyte API request parameters per line. For example:

    {"url": "https://a.example", "browserHtml": true, "geolocation": "GB"}
    {"url": "https://b.example", "httpResponseBody": true}
    {"url": "https://books.toscrape.com", "productNavigation": true}
    

Output file

You can specify the path to an output file with the --output/-o switch. If not specified, the output is printed on the standard output.

Warning

The output path is overwritten.

The output file is in JSON Lines format. Each line contains a JSON object with a response from Zyte API.

By default, zyte-api uses multiple concurrent connections for performance reasons and, as a result, the order of responses will probably not match the order of the source requests from the input file. If you need to match the output results to the input requests, the best way is to use echoData. By default, zyte-api fills echoData with the input URL.

Optimization

By default, zyte-api uses 20 concurrent connections for requests. Use the --n-conn switch to change that:

zyte-api --n-conn 40 

The --shuffle option can be useful if you target multiple websites and your input file is sorted by website, to randomize the request order and hence distribute the load somewhat evenly:

zyte-api urls.txt --shuffle 

For guidelines on how to choose the optimal --n-conn value for you, and other optimization tips, see Optimizing Zyte API usage.

Errors and retries

zyte-api automatically handles retries for rate-limiting and unsuccessful responses, as well as network errors, following the default retry policy.

Use --dont-retry-errors to disable the retrying of error responses, and retrying only rate-limiting responses:

zyte-api --dont-retry-errors 

By default, errors are only logged in the standard error output (stderr). If you want to include error responses in the output file, use --store-errors:

zyte-api --store-errors 

See also

CLI reference