Command-line client¶
Once you have installed python-zyte-api and configured
your API key, you can use the zyte-api
command-line client.
To use zyte-api
, pass an input file as the first
parameter and specify an output file with --output
.
For example:
zyte-api urls.txt --output result.jsonl
Input file¶
The input file can be either of the following:
A plain-text file with a list of target URLs, one per line. For example:
https://books.toscrape.com https://quotes.toscrape.com
For each URL, a Zyte API request will be sent with browserHtml set to
True
.A JSON Lines file with a object of Zyte API request parameters per line. For example:
{"url": "https://a.example", "browserHtml": true, "geolocation": "GB"} {"url": "https://b.example", "httpResponseBody": true} {"url": "https://books.toscrape.com", "productNavigation": true}
Output file¶
You can specify the path to an output file with the --output
/-o
switch.
If not specified, the output is printed on the standard output.
Warning
The output path is overwritten.
The output file is in JSON Lines format. Each line contains a JSON object with a response from Zyte API.
By default, zyte-api
uses multiple concurrent connections for
performance reasons and, as a result, the order of
responses will probably not match the order of the source requests from the
input file. If you need to match the output results to the
input requests, the best way is to use echoData. By default,
zyte-api
fills echoData with the input URL.
Optimization¶
By default, zyte-api
uses 20 concurrent connections for requests. Use the
--n-conn
switch to change that:
zyte-api --n-conn 40 …
The --shuffle
option can be useful if you target multiple websites and your
input file is sorted by website, to randomize the request
order and hence distribute the load somewhat evenly:
zyte-api urls.txt --shuffle …
For guidelines on how to choose the optimal --n-conn
value for you, and
other optimization tips, see Optimizing Zyte API usage.
Errors and retries¶
zyte-api
automatically handles retries for rate-limiting and unsuccessful responses, as well as network errors,
following the default retry policy.
Use --dont-retry-errors
to disable the retrying of error responses, and
retrying only rate-limiting responses:
zyte-api --dont-retry-errors …
By default, errors are only logged in the standard error output (stderr
).
If you want to include error responses in the output file, use
--store-errors
:
zyte-api --store-errors …
See also