Command-line client
Once you have installed python-zyte-api and configured your
API key or Ethereum private key, you can use the
zyte-api command-line client.
To use zyte-api, pass an input file as the first
parameter and specify an output file with --output.
For example:
zyte-api urls.txt --output result.jsonl
Input file
The input file can be either of the following:
A plain-text file with a list of target URLs, one per line. For example:
https://books.toscrape.com https://quotes.toscrape.com
For each URL, a Zyte API request will be sent with browserHtml set to
True.A JSON Lines file with a object of Zyte API request parameters per line. For example:
{"url": "https://a.example", "browserHtml": true, "geolocation": "GB"} {"url": "https://b.example", "httpResponseBody": true} {"url": "https://books.toscrape.com", "productNavigation": true}
Output file
You can specify the path to an output file with the --output/-o switch.
If not specified, the output is printed on the standard output.
Warning
The output path is overwritten.
The output file is in JSON Lines format. Each line contains a JSON object with a response from Zyte API.
By default, zyte-api uses multiple concurrent connections for
performance reasons and, as a result, the order of
responses will probably not match the order of the source requests from the
input file. If you need to match the output results to the
input requests, the best way is to use echoData. By default,
zyte-api fills echoData with the input URL.
Optimization
By default, zyte-api uses 20 concurrent connections for requests. Use the
--n-conn switch to change that:
zyte-api --n-conn 40 …
The --shuffle option can be useful if you target multiple websites and your
input file is sorted by website, to randomize the request
order and hence distribute the load somewhat evenly:
zyte-api urls.txt --shuffle …
For guidelines on how to choose the optimal --n-conn value for you, and
other optimization tips, see Optimizing Zyte API usage.
Errors and retries
zyte-api automatically handles retries for rate-limiting and unsuccessful responses, as well as network errors,
following the default retry policy.
Use --dont-retry-errors to disable the retrying of error responses, and
retrying only rate-limiting responses:
zyte-api --dont-retry-errors …
By default, errors are only logged in the standard error output (stderr).
If you want to include error responses in the output file, use
--store-errors:
zyte-api --store-errors …
See also