Oxylabs Documentation
English
Search
⌃K

Baidu

There are two approaches to retrieving data from Baidu using our SERP Scraper API. You can give us a full URL or pass parameters via the specifically built data source - Search.

Overview

Below is a quick overview of all the available data source values we support with Baidu.
Source
Description
Structured data
baidu
Submit any Baidu URL you like.
No.
baidu_search
Baidu SERPs.
No.

URL

The baidu source is designed to retrieve the content from direct URLs of various Baidu pages. Instead of sending multiple parameters, you can provide us with a direct URL required for Baidu page. We do not strip any parameters or alter your URLs in any other way.

Query parameters

Parameter
Description
Default Value
source
Data source. More info.
baidu
url
Direct URL (link) to Baidu page
-
user_agent_type
Device type and browser. The full list can be found here.
desktop
callback_url
UURL to your callback endpoint. More info.
-
- required parameter

Code examples

In the example below, we make a request to retrieve a result for the provided URL.
JSON
cURL
Python
PHP
HTTP
{
"source": "baidu",
"url": "http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas"
}
curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "baidu", "url": "http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas"}'
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'baidu',
'url': 'http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas'
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Instead of response with job status and results url, this will return the
# JSON response with results.
pprint(response.json())
<?php
$params = array(
'source' => 'baidu',
'url' => 'http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://data.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");
$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close($ch);
# URL has to be encoded to escape `&` and `=` characters:
# URL: http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas
# Encoded URL: http%3A%2F%2Fwww.baidu.com%2Fs%3Fie%3Dutf-8%26f%3D8%26rsv_bp%3D1%26rsv_idx%3D1%26ch%3D%26tn%3Dbaidu%26bar%3D%26wd%3Dadidas
https://realtime.oxylabs.io/v1/queries?source=baidu&url=http%3A%2F%2Fwww.baidu.com%2Fs%3Fie%3Dutf-8%26f%3D8%26rsv_bp%3D1%26rsv_idx%3D1%26ch%3D%26tn%3Dbaidu&bar=&wd=adidas&access_token=12345abcde
The example above uses the Realtime integration method. If you would like to use some other integration method in your query (e.g. Push-Pull or Proxy Endpoint), refer to the integration methods section.
The baidu_search source is designed to retrieve Baidu Search results (SERPs) in HTML format.

Query parameters

Parameter
Description
Default Value
source
Data source. More info.
baidu_search
domain
Domain localization
com
query
UTF-encoded keyword
-
start_page
Starting page number
1
pages
Number of pages to retrieve
1
limit
Number of results to retrieve in each page
10
user_agent_type
Device type and browser. The full list can be found here.
desktop
callback_url
URL to your callback endpoint. More info.
-
- required parameter

Code examples

In the example below, we make a request to retrieve 10 Baidu SERPs, starting with the 11th page, for the search term adidas.
JSON
cURL
Python
PHP
HTTP
{
"source": "baidu_search",
"domain": "com",
"query": "adidas",
"start_page": 11,
"pages": 10
}
curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "baidu_search", "domain": "com", "query": "adidas", "start_page": 11, "pages": 10, "callback_url": "https://your.callback.url"}'
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'baidu_search',
'domain': 'com',
'query': 'adidas',
'start_page': 11,
'pages': 10
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())
<?php
$params = array(
'source' => 'baidu_search',
'domain' => 'com',
'query' => 'adidas',
'start_page' => 11,
'pages' => 10,
'callback_url' => 'https://your.callback.url'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://data.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");
$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close($ch);
https://realtime.oxylabs.io/v1/queries?source=baidu_search&domain=com&query=adidas&start_page=11&pages=10&access_token=12345abcde
The example above uses the Realtime integration method. If you would like to use some other integration method in your query (e.g. Push-Pull or Proxy Endpoint), refer to the integration methods section.