其他域名
这种数据类型是通用的并且可以应用于任何域名。它接受带有附加参数的 URL。您可以在 URL 部分找到所有可用参数的列表。
总览
以下是我们通过其他域支持的所有可用数据source
值的快速概述 。
URL
查询参数
- 必须提供的参数
代码示例
在这个示例中,API 将检索一个电子商务产品页面。包括所有可用参数(尽管在同一个请求中并不总是必要的或兼容的),以便让您知道如何格式化您的请求:
{
"source": "universal_ecommerce",
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
"user_agent_type": "desktop",
"geo_location": "United States",
"parse": true,
"parser_type": "ecommerce_product",
"context": [
{
"key": "headers",
"value": {
"Accept-Language": "en-US",
"Content-Type": "application/octet-stream",
"Custom-Header-Name": "custom header content"
}
},
{
"key": "cookies",
"value": [
{
"key": "NID",
"value": "1234567890"
},
{
"key": "1P JAR",
"value": "0987654321"
}]
},
{
"key": "follow_redirects",
"value": true
},
{
"key": "http_method", "value": "get"
},
{
"key": "content",
"value": "YmFzZTY0RW5jb2RlZFBPU1RCb2R5"
},
{
"key": "successful_status_codes",
"value": [808, 909]
}]
}
curl --user user:pass \
'https://realtime.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{"source": "universal_ecommerce", "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html", "user_agent_type": "desktop","geo_location": "United States", "parse": true, "parser_type": "ecommerce_product", "context": [{"key": "headers", "value": {"Accept-Language": "en-US", "Content-Type": "application/octet-stream", "Custom-Header": "custom header content"}}, {"key": "cookies", "value": [{"key": "NID", "value": "1234567890"}, {"key": "1P JAR", "value": "0987654321"}]}, {"key": "follow_redirects", "value": true}, {"key": "http_method", "value": "get"}, {"key": "content", "value": "abcd1234"}, {"key": "successful_status_codes", "value": [707, 808, 909]}]}'
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'universal_ecommerce',
'url': 'https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
'user_agent_type': 'desktop',
'geo_location': 'United States',
"parse": true,
"parser_type": "ecommerce_product",
'context': [
{
'key': 'session_id',
'value': '1234567890abcdef'
}
{
'key': 'headers', 'value':
{
'Accept-Language': 'en-US',
'Content-Type': 'application/octet-stream',
'Custom-Header': 'custom header content'
}
},
{
'key': 'cookies',
'value': [{
'key': 'NID',
'value': '1234567890'
},
{
'key': '1P_JAR',
'value': '0987654321'
}
]
},
{
'key': 'follow_redirects',
'value': true
},
{
'key': 'successful_status_codes',
'value': [303, 808, 909]
},
{
'key': 'http_method',
'value': 'get'
}
{
'key': 'content'
'value': 'base64EncodedPOSTBody'
}
],
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Instead of response with job status and results url, this will return the
# JSON response with the result.
pprint(response.json())
<?php
$params = [
'source' => 'universal_ecommerce',
'url' => 'https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
'geo_location' => 'United States',
'parse' => true,
'parser_type' => 'ecommerce_product',
'context' => [
[
'key' => 'session_id',
'value' => '1234567890abcdef'
],
[
'key' => 'headers',
'value' => [
'Accept-Language' => 'en-US',
'Content-Type' => 'application/octet-stream',
'Custom-Header' => 'custom header content'
],
],
[
'key' => 'cookies',
'value' => [
['key' => 'NID', 'value' => '1234567890'],
['key' => '1P_JAR', 'value' => '0987654321']
]
],
[
'key' => 'follow_redirects',
'value' => 'true'
],
[
'key' => 'successful_status_codes',
'value' => [303, 808, 909]
],
[
'key' => 'http_method',
'value' => 'get'
],
[
'key' => 'content',
'value' => 'base64EncodedPOSTBody'
]
]
];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://realtime.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");
$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close ($ch);
?>
# The whole string you submit has to be URL-encoded.
https://realtime.oxylabs.io/v1/queries?source=universal_ecommerce&url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2Ftagged%2Fpython&access_token=12345abcde
以上示例使用了 Realtime 集成方法。如果您想在您的查询中使用一些其他集成方法(如推拉或代理端点),请参考集成方法部分。
形成URL
Wayfair
作业参数分配到URL:
https://www.wayfair.<domain>/keyword.php?keyword=<query>&itemsperpage=<limit>&curpage=<start_page>
当形成URL时,请遵循以下说明:
编码搜索词:搜索词必须进行
URL
编码。例如,空格应替换为%20
,这代表URL
中的空格字符。参数:如果
limit
等于48且start_page
等于1,则必须将以下附加参数附加到URL
:command=dosearch
new_keyword_search=true
生成的URL示例
https://www.wayfair.com/keyword.php?keyword=test&itemsperpage=24&curpage=1
https://www.wayfair.fr/keyword.php?keyword=t%202&itemsperpage=48&curpage=1&command=dosearch&new_keyword_search=true
等效作业示例
停用的Wayfair搜索源:
{
"source": "wayfair_search",
"query": "test",
"domain": "com",
"limit": 5,
"start_page": 3
}
更新后的
universal_ecommerce
源:
{
"source": "universal_ecommerce",
"url": "https://www.wayfair.com/keyword.php?keyword=room&itemsperpage=5&curpage=10"
}
最后更新于