Push-Pull

Overview

Push-Pull is not the most simple integration method, but it's the most reliable one, which is why it's the one we recommend implementing, especially if you work with large amounts of data.
You can visit Oxylabs GitHub repository for a complete working example of Push-Pull integration in Python.
Push-Pull is an asynchronous integration method. This means that upon submitting your job, we will quickly return you a JSON with the job information (all submitted job parameters and the job ID, as well as the URLs for downloading the result and checking the job status). With this integration method, job submission is a completely separate process from downloading results.
Once we are done with processing your job, we will POST a JSON payload with the updated job information (including the status of the job set to done) to your server, if you provided a callback URL while submitting the job. At that point, you can go ahead and download the results from our system. We keep results available for retrieval for at least 24 hours after completion.
With Push-Pull, you can get your results uploaded straight to your cloud storage (AWS S3 or Google Cloud Storage).
If you don't want to bother setting up a service that accepts incoming callback notifications, you can just try getting your results every few seconds (a concept called polling).
You can also try and see how Push-Pull works via Postman.

Single Job

Description

The following endpoint only accepts a single query or url value.

Endpoint

POST https://data.oxylabs.io/v1/queries

Input

You have to send your job parameters in a JSON payload, as illustrated in the code examples below:
cURL
Python
PHP
C#
Golang
Java
Node.js
curl --user user:pass1 \
'https://data.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{"source": "ENTER_SOURCE_HERE", "url": "https://www.example.com", "geo_location": "United States", "callback_url": "https://your.callback.url", "storage_type": "s3", "storage_url": "s3://your.storage.bucket.url"}'
import requests
from pprint import pprint
# Structure payload.
payload = {
"source": "ENTER_SOURCE_HERE", # Source you choose e.g. "universal"
"url": "https://www.example.com", # Check speficic source if you should use "url" or "query"
"geo_location": "United States", # Some sources accept zip-code or cooprdinates
#"render" : "html", # Uncomment you want to render JavaScript within the page
#"parse" : true, # Check what sources support parsed data
#"callback_url": "https://your.callback.url", #required if using callback listener
"callback_url": "https://your.callback.url",
"storage_type": "s3",
"storage_url": "s3://your.storage.bucket.url"
}
# Get response.
response = requests.request(
'POST',
'https://data.oxylabs.io/v1/queries',
auth=('YOUR_USERNAME', 'YOUR_PASSWORD'), #Your credentials go here
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())
<?php
$params = array(
'source' => 'ENTER_SOURCE_HERE', //Source you choose e.g. "universal"
'url' => 'https://www.example.com', // Check speficic source if you should use "url" or "query"
'geo_location' => 'United States', //Some sources accept zip-code or cooprdinates
//'render' : 'html', // Uncomment you want to render JavaScript within the page
//'parse' : TRUE, // Check what sources support parsed data
//'callback_url' => 'https://your.callback.url', //required if using callback listener
'callback_url': 'https://your.callback.url',
'storage_type' => 's3',
'storage_url' => 's3://your.storage.bucket.url'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://data.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "YOUR_USERNAME" . ":" . "YOUR_PASSWORD"); //Your credentials go here
$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close ($ch);
?>
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Net.Http.Json;
using System.Threading.Tasks;
namespace OxyApi
{
class Program
{
static async Task Main()
{
const string Username = "YOUR_USERNAME";
const string Password = "YOUR_PASSWORD";
var parameters = new Dictionary<string, string>()
{
{ "source", "ENTER_SOURCE_HERE" },
{ "url", "https://example.com" },
{ "geo_location", "United States" },
{ "callback_url", "https://your.callback.url" },
};
var client = new HttpClient();
Uri baseUri = new Uri("https://data.oxylabs.io");
client.BaseAddress = baseUri;
var requestMessage = new HttpRequestMessage(HttpMethod.Post, "/v1/queries");
requestMessage.Content = JsonContent.Create(parameters);
var authenticationString = quot;{Username}:{Password}";
var base64EncodedAuthenticationString = Convert.ToBase64String(System.Text.ASCIIEncoding.UTF8.GetBytes(authenticationString));
requestMessage.Headers.Add("Authorization", "Basic " + base64EncodedAuthenticationString);
var response = await client.SendAsync(requestMessage);
var contents = await response.Content.ReadAsStringAsync();
Console.WriteLine(contents);
}
}
}
package main
import (
"bytes"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
)
func main() {
const Username = "YOUR_USERNAME"
const Password = "YOUR_PASSWORD"
payload := map[string]string{
"source": "ENTER_SOURCE_HERE",
"url": "https://example.com",
"geo_location": "United States",
"callback_url": "https://your.callback.url",
}
jsonValue, _ := json.Marshal(payload)
client := &http.Client{}
request, _ := http.NewRequest("POST",
"https://data.oxylabs.io/v1/queries",
bytes.NewBuffer(jsonValue),
)
request.Header.Add("Content-type", "application/json")
request.SetBasicAuth(Username, Password)
response, _ := client.Do(request)
responseText, _ := ioutil.ReadAll(response.Body)
fmt.Println(string(responseText))
}
import okhttp3.*;
import org.json.JSONObject;
public class Main implements Runnable {
private static final String AUTHORIZATION_HEADER = "Authorization";
public static final String USERNAME = "YOUR_USERNAME";
public static final String PASSWORD = "YOUR_PASSWORD";
public void run() {
JSONObject jsonObject = new JSONObject();
jsonObject.put("source", "ENTER_SOURCE_HERE");
jsonObject.put("url", "https://example.com");
jsonObject.put("geo_location", "United States");
jsonObject.put("callback_url", "https://your.callback.url");
Authenticator authenticator = (route, response) -> {
String credential = Credentials.basic(USERNAME, PASSWORD);
return response
.request()
.newBuilder()
.header(AUTHORIZATION_HEADER, credential)
.build();
};
var client = new OkHttpClient.Builder()
.authenticator(authenticator)
.build();
var mediaType = MediaType.parse("application/json; charset=utf-8");
var body = RequestBody.create(jsonObject.toString(), mediaType);
var request = new Request.Builder()
.url("https://data.oxylabs.io/v1/queries")
.post(body)
.build();
try (var response = client.newCall(request).execute()) {
assert response.body() != null;
System.out.println(response.body().string());
} catch (Exception exception) {
System.out.println("Error: " + exception.getMessage());
}
System.exit(0);
}
public static void main(String[] args) {
new Thread(new Main()).start();
}
}
import fetch from 'node-fetch';
const username = 'YOUR_USERNAME';
const password = 'YOUR_PASSWORD';
const body = {
source: 'ENTER_SOURCE_HERE',
url: 'https://www.example.com',
geo_location: 'United States',
callback_url: 'https://your.callback.url',
};
const response = await fetch('https://data.oxylabs.io/v1/queries', {
method: 'post',
body: JSON.stringify(body),
headers: {
'Content-Type': 'application/json',
'Authorization': 'Basic ' + Buffer.from(`${username}:${password}`).toString('base64'),
}
});
console.log(await response.json());

Output

The API will respond with a JSON with the job information, similar to this:
{
"callback_url": "https://your.callback.url",
"client_id": 5,
"context": [
{
"key": "results_language",
"value": null
},
{
"key": "safe_search",
"value": null
},
{
"key": "tbm",
"value": null
},
{
"key": "cr",
"value": null
},
{
"key": "filter",
"value": null
}
],
"created_at": "2019-10-01 00:00:01",
"domain": "com",
"geo_location": "United States",
"id": "12345678900987654321",
"limit": 10,
"locale": null,
"pages": 1,
"parse": false,
"render": null,
"url": "https://www.example.com",
"source": "universal",
"start_page": 1,
"status": "pending",
"storage_type": "s3",
"storage_url": "YOUR_BUCKET_NAME/12345678900987654321.json",
"subdomain": "www",
"updated_at": "2019-10-01 00:00:01",
"user_agent_type": "desktop",
"_links": [
{
"rel": "self",
"href": "http://data.oxylabs.io/v1/queries/12345678900987654321",
"method": "GET"
},
{
"rel": "results",
"href": "http://data.oxylabs.io/v1/queries/12345678900987654321/results",
"method": "GET"
}
]
}

Data dictionary

A lot of the keys in the output represent job input parameters. Please refer to the documentation of the scraper you are interested in to find the descriptions of the relevant job input parameters.
The dictionary below only describes the keys that are not described elsewhere.
Key
Description
Type
created_at
The datetime the job was created at.
String
client_id
The numerical ID associated with the username of the client making the request.
String
content_encoding
String
id
The unique ID of the job.
String
session_info
statuses
status
The status of the job. pending means the job is still being processed. done means we've completed the job. faulted means we came across errors while trying to complete the job -- and gave up at it.
String
subdomain
String
updated_at
The datetime the job was last updated at. For jobs that are finished (status is done or faulted), this datetime indicates when the job was finished.
String
links
The
JSON Array
links:rel
String
links:href
String
links:method
String

Check Job Status

Description

If you included a valid callback_url value when submitting your job. Upon completing the job, we will POST a JSON payload to the callback URL you specified. The JSON payload will indicate that the job has been finished and its status set to done.
However, if you submitted a job without a callback_url, you can check the job status yourself. To accomplish this, use the URL in href in rel:self, taken from the response message you received after submitting the job.
The URL for checking the job status will look similar to this: http://data.oxylabs.io/v1/queries/12345678900987654321. Querying this URL will return the job information, including its status.
There are 3 possible status values:
Parameter
Description
pending
The job is still being processed and has not been completed.
done
The job is completed, you may retrieve the result by querying the URL in href in rel:results, e.g. http://data.oxylabs.io/v1/queries/12345678900987654321/results
faulted
There was an issue with the job, and we couldn't complete it. You are not charged for any faulted jobs.

Endpoint

GET https://data.oxylabs.io/v1/queries/{id}

Input

cURL
Python
PHP
C#
Golang
Java
Node.js
curl --user user:pass1 \
'http://data.oxylabs.io/v1/queries/12345678900987654321'
import requests
from pprint import pprint
# Get response from stats endpoint.
response = requests.request(
method='GET',
url='http://data.oxylabs.io/v1/queries/12345678900987654321',
auth=('user', 'pass1'),
)
# Print prettified JSON response to stdout.
pprint(response.json())
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://data.oxylabs.io/v1/queries/12345678900987654321");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET");
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close ($ch);
?>
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Net.Http.Json;
using System.Threading.Tasks;
namespace OxyApi
{
class Program
{
static async Task Main()
{
const string JobId = "12345678900987654321";
const string Username = "YOUR_USERNAME";
const string Password = "YOUR_PASSWORD";
var client = new HttpClient();
Uri baseUri = new Uri("https://data.oxylabs.io");
client.BaseAddress = baseUri;
var requestMessage = new HttpRequestMessage(HttpMethod.Get, quot;/v1/queries/{JobId}");
var authenticationString = quot;{Username}:{Password}";
var base64EncodedAuthenticationString = Convert.ToBase64String(System.Text.ASCIIEncoding.UTF8.GetBytes(authenticationString));
requestMessage.Headers.Add("Authorization", "Basic " + base64EncodedAuthenticationString);
var response = await client.SendAsync(requestMessage);
var contents = await response.Content.ReadAsStringAsync();
Console.WriteLine(contents);
}
}
}
package main
import (
"fmt"
"io/ioutil"
"net/http"
)
func main() {
const JobId = "12345678900987654321"
const Username = "YOUR_USERNAME"
const Password = "YOUR_PASSWORD"
client := &http.Client{}
request, _ := http.NewRequest("GET",
fmt.Sprintf("https://data.oxylabs.io/v1/queries/%s", JobId),
nil,
)
request.Header.Add("Content-type", "application/json")
request.SetBasicAuth(Username, Password)
response, _ := client.Do(request)
responseText, _ := ioutil.ReadAll(response.Body)
fmt.Println(string(responseText))
}
import okhttp3.*;
public class Main implements Runnable {
private static final String AUTHORIZATION_HEADER = "Authorization";
private static final String JOB_ID = "12345678900987654321";
public static final String USERNAME = "YOUR_USERNAME";
public static final String PASSWORD = "YOUR_PASSWORD";
public void run() {
Authenticator authenticator = (route, response) -> {
String credential = Credentials.basic(USERNAME, PASSWORD);
return response
.request()
.newBuilder()
.header(AUTHORIZATION_HEADER, credential)
.build();
};
var client = new OkHttpClient.Builder()
.authenticator(authenticator)
.build();
var request = new Request.Builder()
.url(String.format("https://data.oxylabs.io/v1/queries/%s", JOB_ID))
.get()
.build();
try (var response = client.newCall(request).execute()) {
assert response.body() != null;
System.out.println(response.body().string());
} catch (Exception exception) {
System.out.println("Error: " + exception.getMessage());
}
System.exit(0);
}
public static void main(String[] args) {
new Thread(new Main()).start();
}
}
import fetch from 'node-fetch';
const jobId = '12345678900987654321';
const username = 'YOUR_USERNAME';
const password = 'YOUR_PASSWORD';
const response = await fetch(`https://data.oxylabs.io/v1/queries/${jobId}`, {
method: 'get',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Basic ' + Buffer.from(`${username}:${password}`).toString('base64'),
}
});
console.log(await response.json());

Output

{
"client_id": 5,
"context": [
{
"key": "results_language",
"value": null
},
{
"key": "safe_search",
"value": null
},
{
"key": "tbm",
"value": null
},
{
"key": "cr",
"value": null
},
{
"key": "filter",
"value": null
}
],
"created_at": "2019-10-01 00:00:01",
"domain": "com",
"geo_location": null,
"id": "12345678900987654321",
"limit": 10,
"locale": null,
"pages": 1,
"parse": false,
"render": null,
"query": "adidas",
"source": "google_shopping_search",
"start_page": 1,
"status": "done",
"subdomain": "www",
"updated_at": "2019-10-01 00:00:15",
"user_agent_type": "desktop",
"_links": [
{
"rel": "self",
"href": "http://data.oxylabs.io/v1/queries/12345678900987654321",
"method": "GET"
},
{
"rel": "results",
"href": "http://data.oxylabs.io/v1/queries/12345678900987654321/results",
"method": "GET"
}
]
}
The API will respond with query information in JSON format by printing it in the response body. Notice that the job status has been changed to done. You can now retrieve content by querying http://data.oxylabs.io/v1/queries/12345678900987654321/results. You can also see that the job has been updated_at 2019-10-01 00:00:15 - the job took 14 seconds to complete.

Retrieve Job Content

Description

Once you know the job is ready to be retrieved, you can GET it using the URL in href in rel:results of the job info response. The results link will looks like this: http://data.oxylabs.io/v1/queries/12345678900987654321/results.

Endpoint

GET https://data.oxylabs.io/v1/queries/{id}/results

Input

The results can be automatically retrieved without periodically checking job status by setting up Callback service. To do that, specify the URL of a server that is able to accept incoming HTTP(S) requests while submitting a job. When our system completes the job, it will POST a JSON payload to the provided URL, and the Callback service will download the results as described in the Callback implementation example.

Input

The code examples below show how you can use the /results endpoint.
cURL
Python
PHP
C#
Golang
Java
Node.js
curl --user user:pass1 \
'http://data.oxylabs.io/v1/queries/12345678900987654321/results'
import requests
from pprint import pprint
# Get response from stats endpoint.
response = requests.request(
method='GET',
url='http://data.oxylabs.io/v1/queries/12345678900987654321/results',
auth=('user', 'pass1'),
)
# Print prettified JSON response to stdout.
pprint(response.json())
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://data.oxylabs.io/v1/queries/12345678900987654321/results");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET");
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close ($ch);
?>
using System;
using System.Net.Http;
using System.Threading.Tasks;
namespace OxyApi
{
class Program
{
static async Task Main()
{
const string JobId = "12345678900987654321";
const string Username = "YOUR_USERNAME";
const string Password = "YOUR_PASSWORD";
var client = new HttpClient();
Uri baseUri = new Uri("https://data.oxylabs.io");
client.BaseAddress = baseUri;
var requestMessage = new HttpRequestMessage(HttpMethod.Get, quot;/v1/queries/{JobId}/results");
var authenticationString = quot;{Username}:{Password}";
var base64EncodedAuthenticationString = Convert.ToBase64String(System.Text.ASCIIEncoding.UTF8.GetBytes(authenticationString));
requestMessage.Headers.Add("Authorization", "Basic " + base64EncodedAuthenticationString);
var response = await client.SendAsync(requestMessage);
var contents = await response.Content.ReadAsStringAsync();
Console.WriteLine(contents);
}
}
}
package main
import (
"fmt"
"io/ioutil"
"net/http"
)
func main() {
const JobId = "12345678900987654321"
const Username = "YOUR_USERNAME"
const Password = "YOUR_PASSWORD"
client := &http.Client{}
request, _ := http.NewRequest("GET",
fmt.Sprintf("https://data.oxylabs.io/v1/queries/%s/results", JobId),
nil,
)
request.Header.Add("Content-type", "application/json")
request.SetBasicAuth(Username, Password)
response, _ := client.Do(request)
responseText, _ := ioutil.ReadAll(response.Body)
fmt.Println(string(responseText))
}
import okhttp3.*;
public class Main implements Runnable {
private static final String AUTHORIZATION_HEADER = "Authorization";
private static final String JOB_ID = "12345678900987654321";
public static final String USERNAME = "YOUR_USERNAME";
public static final String PASSWORD = "YOUR_PASSWORD";
public void run() {
Authenticator authenticator = (route, response) -> {
String credential = Credentials.basic(USERNAME, PASSWORD);
return response
.request()
.newBuilder()
.header(AUTHORIZATION_HEADER, credential)
.build();
};
var client = new OkHttpClient.Builder()
.authenticator(authenticator)
.build();
var request = new Request.Builder()
.url(String.format("https://data.oxylabs.io/v1/queries/%s/results", JOB_ID))
.get()
.build();
try (var response = client.newCall(request).execute()) {
assert response.body() != null;
System.out.println(response.body().string());
} catch (Exception exception) {
System.out.println("Error: " + exception.getMessage());
}
System.exit(0);
}
public static void main(String[] args) {
new Thread(new Main()).start();
}
}
import fetch from 'node-fetch';
const jobId = '12345678900987654321';
const username = 'YOUR_USERNAME';
const password = 'YOUR_PASSWORD';
const response = await fetch(`https://data.oxylabs.io/v1/queries/${jobId}/results`, {
method: 'get',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Basic ' + Buffer.from(`${username}:${password}`).toString('base64'),
}
});
console.log(await response.json());

Output

The JSON file below contains a sample response of the /results endpoint:
{
"results": [
{
"content": "<!doctype html><html>
CONTENT
</html>",
"created_at": "2019-10-01 00:00:01",
"updated_at": "2019-10-01 00:00:15",
"page": 1,
"url": "https://www.google.com/search?q=adidas&hl=en&gl=US",
"job_id": "12345678900987654321",
"status_code": 200
}
]
}

Callback

The callback is a POST request we send to your machine, informing that the data extraction task is completed and providing a URL to download scraped content. This means that you no longer need to check job status manually. Once the data is here, we will let you know, and all you need to do now is to retrieve it. Please see code samples in Python and PHP.
Python
PHP
C#
Golang
Java
Node.js
# This is a simple Sanic web server with a route listening for callbacks on localhost:8080.
# It will print job results to stdout.
import requests
from pprint import pprint
from sanic import Sanic, response
AUTH_TUPLE = ('user', 'pass1')
app = Sanic()
# Define /job_listener endpoint that accepts POST requests.
@app.route('/job_listener', methods=['POST'])
async def job_listener(request):
try:
res = request.json
links = res.get('_links', [])
for link in links:
if link['rel'] == 'results':
# Sanic is async, but requests are synchronous, to fully take
# advantage of Sanic, use aiohttp.
res_response = requests.request(
method='GET',
url=link['href'],
auth=AUTH_TUPLE,
)
pprint(res_response.json())
break
except Exception as e:
print("Listener exception: {}".format(e))
return response.json(status=200, body={'status': 'ok'})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8080)
<?php
$stdout = fopen('php://stdout', 'w');
if (isset($_POST)) {
$result = array_merge($_POST, (array) json_decode(file_get_contents('php://input')));
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://data.oxylabs.io/v1/queries/".$result['id'].'/results');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET");
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");
$result = curl_exec($ch);
fwrite($stdout, $result);
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close ($ch);
}