Push-Pull: Batch

Push-Pull is our recommended integration method for reliably handling large amounts of data.

Visit the Oxylabs GitHub repository for a complete working example of Push-Pull integration in Python.

Push-Pull is an asynchronous integration method. Upon job submission, you will promptly receive a JSON response containing all job details, including job parameters, ID, and URLs for result download and status checking. Once your job is processed, we will update you via a JSON payload sent to your server, if you provided a callback URL. Results remain available for retrieval for at least 24 hours after completion.

With Push-Pull, you can upload your results directly to your cloud storage (AWS S3 or Google Cloud Storage).

If you prefer not to set up a service for incoming callback notifications, you can simply retrieve your results periodically (polling).

You can also explore how Push-Pull works using Postman.

Batch Query

Scraper APIs supports submitting up to 5,000 query or url parameter values within a single batch request.

Endpoint

POST https://data.oxylabs.io/v1/queries/batch

The system will handle every query or url submitted as a separate job. If you provide a callback URL, you will get a separate call for each keyword. Otherwise, our initial response will contain job ids for all keywords. For example, if you sent 50 keywords, we will return 50 unique job ids.

IMPORTANT: With /batch endpoint, you can only submit lists of queryor urlparameter values (depending on the source you use). All other parameters should have singular values.

Input

You need to post query parameters as a JSON payload. Here is how you submit a batch job:

curl --user "user:pass1" \
'https://data.oxylabs.io/v1/queries/batch' \
-H 'Content-Type: application/json' \
-d '@keywords.json'

You may notice that the code example above doesn't explain how the JSON payload should be formatted and points out to a pre-made JSON file. Below is the content of keywords.json file, containing multiple query parameter values:

{  
   "query":[  
      "adidas",
      "nike",
      "reebok"
   ],
   "source": "google_shopping_search",
   "domain": "com",
   "callback_url": "https://your.callback.url"
}

...and here is a keywords.json batch input file, containing multiple URLs:

{  
   "url":[  
      "https://example.com/url1.html",
      "https://example.com/url2.html",
      "https://example.com/url3.html"
   ],
   "source": "universal",
   "callback_url": "https://your.callback.url"
}

Output

The API will respond with a JSON object, containing the job information for each job created. The response will be similar to this:

{
  "queries": [
    {
      "callback_url": "https://your.callback.url",
      {...}
      "created_at": "2024-06-26 00:00:01",
      "domain": "com",
      "id": "12345678900987654321",
      {...}
      "query": "adidas",
      "source": "google_shopping_search",
      {...}
          "rel": "results",
          "href": "http://data.oxylabs.io/v1/queries/12345678900987654321/results",
          "method": "GET"
        }
      ]
    },
    {
      "callback_url": "https://your.callback.url",
      {...}
      "created_at": "2024-06-26 00:00:01",
      "domain": "com",
      "id": "12345678901234567890",
      {...}
      "query": "nike",
      "source": "google_shopping_search",
      {...}
          "rel": "results",
          "href": "http://data.oxylabs.io/v1/queries/12345678901234567890/results",
          "method": "GET"
        }
      ]
    },
    {
      "callback_url": "https://your.callback.url",
      {...}
      "created_at": "2024-06-26 00:00:01",
      "domain": "com",
      "id": "01234567899876543210",
	  {...}
      "query": "reebok",
      "source": "google_shopping_search",
	  {...}
          "rel": "results",
          "href": "http://data.oxylabs.io/v1/queries/01234567899876543210/results",
          "method": "GET"
        }
      ]
    }
  ]
}

Last updated