Integration Methods

Web Scraper API supports three integration methods, each one with its unique benefits:

Realtime. This method is synchronous. This means that upon sending your job submission request, you will have to keep the connection open until we successfully finish your job or return an error. This integration method is great if you would like to send JSON payloads with scraping and parsing job descriptions, including some advanced scraping parameters, to our API.
Push-Pull (supports batch queries). This integration method is asynchronous. This means that upon submitting your job, we will quickly return you a JSON with your job info, including the URLs for downloading the result/checking the job status. Once we are done processing your job, we will POST a notification about job completion to your server, at which point you can go ahead and download the results. This integration method also enables uploading results straight to your (AWS S3 or Google Cloud Storage).

Push-Pull is our recommended integration method for reliably handling large amounts of data.

Proxy Endpoint. This method is also synchronous (like Realtime), but instead of using our service via a RESTful interface, you can use our endpoint like a proxy. Use Proxy Endpoint if you've used proxies before and would just like to get unblocked content from us.

The Time-To-Live (TTL) for all API connections is set to 150 seconds. It's important to note that in rare cases, connections may timeout before receiving a response. Factors such as system load or extremely complex job submissions can contribute to timeouts.

PreviousWeb Scraper API NextRealtime

Last updated 4 months ago

Was this helpful?