Baidu
有两种方法可以使用我们的 SERP 爬虫 API 检索 Baidu 中的数据。您可以向我们提供一个完整的 URL 或通过专门建立的数据源传递参数 - 搜索。
总览
以下是我们通过 Baidu 支持的所有可用数据source
值的快速概览。
URL
baidu
源设计用于检索各种 Baidu 页面直接 URL 中的检索内容。与其发送多个参数,您可以向我们提供 Baidu 页面所需的直接 URL。我们不会剥离任何参数或以任何其他方式更改您的 URL
查询参数
- 必须提供的参数
代码示例
在下面的示例中,我们提出了一个请求,以检索提供的 URL 的一个结果。
{
"source": "baidu",
"url": "http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas"
}
curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
-d '{"source": "baidu", "url": "http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas"}'
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'baidu',
'url': 'http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas'
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Instead of response with job status and results url, this will return the
# JSON response with results.
pprint(response.json())
<?php
$params = array(
'source' => 'baidu',
'url' => 'http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://data.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");
$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close ($ch);
?>
# URL has to be encoded to escape `&` and `=` characters:
# URL: http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas
# Encoded URL: http%3A%2F%2Fwww.baidu.com%2Fs%3Fie%3Dutf-8%26f%3D8%26rsv_bp%3D1%26rsv_idx%3D1%26ch%3D%26tn%3Dbaidu%26bar%3D%26wd%3Dadidas
https://realtime.oxylabs.io/v1/queries?source=baidu&url=http%3A%2F%2Fwww.baidu.com%2Fs%3Fie%3Dutf-8%26f%3D8%26rsv_bp%3D1%26rsv_idx%3D1%26ch%3D%26tn%3Dbaidu&bar=&wd=adidas&access_token=12345abcde
以上示例使用了 Realtime 集成方法。如果您想在您的查询中使用一些其他集成方法(如推拉或代理端点),请参考集成方法部分。
搜索
baidu_search
源设计用于检索 HTML 格式的 Baidu 搜索结果(SERP)。
查询参数
- 必须提供的参数
代码示例
在以下示例中,我们提出了一个请求,以检索 10
个 Baidu SERP,从第 11
页开始,搜索词为 adidas
。
{
"source": "baidu_search",
"domain": "com",
"query": "adidas",
"start_page": 11,
"pages": 10
}
curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
-d '{"source": "baidu_search", "domain": "com", "query": "adidas", "start_page": 11, "pages": 10, "callback_url": "https://your.callback.url"}'
import requests
from pprint import pprint
thon
# Structure payload.
payload = {
'source': 'baidu_search',
'domain': 'com',
'query': 'adidas',
'start_page': 11,
'pages': 10
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())
<?php
$params = array(
'source' => 'baidu_search',
'domain' => 'com',
'query' => 'adidas',
'start_page' => 11,
'pages' => 10,
'callback_url' => 'https://your.callback.url'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://data.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");
$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close ($ch);
?>
https://realtime.oxylabs.io/v1/queries?source=baidu_search&domain=com&query=adidas&start_page=11&pages=10&access_token=12345abcde
以上示例使用了 Realtime 集成方法。如果您想在您的查询中使用一些其他集成方法(如推拉或代理端点),请参考集成方法部分。
最后更新于