Baidu
There are two approaches to retrieving data from Baidu using our SERP Scraper API. You can give us a full URL or pass parameters via the specifically built data source - Search.
Below is a quick overview of all the available data
source
values we support with Baidu.Source | Description | Structured data |
---|---|---|
baidu | Submit any Baidu URL you like. | No. |
baidu_search | Baidu SERPs. | No. |
Although we do not have dedicated parsers for Baidu, you can write your own parsing instructions with Custom Parser feature and get structured data.
The
baidu
source is designed to retrieve the content from direct URLs of various Baidu pages. Instead of sending multiple parameters, you can provide us with a direct URL required for Baidu page. We do not strip any parameters or alter your URLs in any other way. - required parameter
In the example below, we make a request to retrieve a result for the provided URL.
JSON
cURL
Python
PHP
HTTP
{
"source": "baidu",
"url": "http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas"
}
curl --user "user:pass1" 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "baidu", "url": "http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas"}'
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'baidu',
'url': 'http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas'
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Instead of response with job status and results url, this will return the
# JSON response with results.
pprint(response.json())
<?php
$params = array(
'source' => 'baidu',
'url' => 'http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://data.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");
$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close($ch);
# URL has to be encoded to escape `&` and `=` characters:
# URL: http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas
# Encoded URL: http%3A%2F%2Fwww.baidu.com%2Fs%3Fie%3Dutf-8%26f%3D8%26rsv_bp%3D1%26rsv_idx%3D1%26ch%3D%26tn%3Dbaidu%26bar%3D%26wd%3Dadidas
https://realtime.oxylabs.io/v1/queries?source=baidu&url=http%3A%2F%2Fwww.baidu.com%2Fs%3Fie%3Dutf-8%26f%3D8%26rsv_bp%3D1%26rsv_idx%3D1%26ch%3D%26tn%3Dbaidu&bar=&wd=adidas&access_token=12345abcde
The example above uses the Realtime integration method. If you would like to use some other integration method in your query (e.g. Push-Pull or Proxy Endpoint), refer to the integration methods section.
The
baidu_search
source is designed to retrieve Baidu Search results (SERPs) in HTML format.Parameter | Description | Default Value |
---|---|---|
source | baidu_search | |
domain | Localize results for a certain country. Valid values: com ,cn . | com |
query | UTF-encoded keyword | - |
start_page | Starting page number | 1 |
pages | Number of pages to retrieve | 1 |
limit | Number of results to retrieve in each page | 10 |
user_agent_type | desktop | |
callback_url | - |
- required parameter
In the example below, we make a request to retrieve
10
Baidu SERPs, starting with the 11
th page, for the search term adidas
.JSON
cURL
Python
PHP
HTTP
{
"source": "baidu_search",
"domain": "com",
"query": "adidas",
"start_page": 11,
"pages": 10
}
curl --user "user:pass1" 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "baidu_search", "domain": "com", "query": "adidas", "start_page": 11, "pages": 10, "callback_url": "https://your.callback.url"}'
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'baidu_search',
'domain': 'com',
'query': 'adidas',
'start_page': 11,
'pages': 10
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())
<?php
$params = array(
'source' => 'baidu_search',
'domain' => 'com',
'query' => 'adidas',
'start_page' => 11,
'pages' => 10,
'callback_url' => 'https://your.callback.url'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://data.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");
$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close($ch);
https://realtime.oxylabs.io/v1/queries?source=baidu_search&domain=com&query=adidas&start_page=11&pages=10&access_token=12345abcde
The example above uses the Realtime integration method. If you would like to use some other integration method in your query (e.g. Push-Pull or Proxy Endpoint), refer to the integration methods section.
Last modified 5mo ago