其他域名
这种数据类型是通用的并且可以应用于任何域名。它接受带有附加参数的 URL。您可以在下表中找到可用参数的列表。
查询参数
 url 
转到通用页面的直接 URL(链接)。
不适用
locale
区域设置,正如接受语言标头所预期的一样。
不适用
context:
content
Base64 编码的 POST 请求正文。只有将 http_method 设为 post 时才有用。
不适用
context:
cookies
传递自己的 cookies。
不适用
context:
follow_redirects
表明您是否希望爬虫程序遵循重定向(带有目标 URL 的 3xx 响应)以获得重定向链末端的 URL 内容。
不适用
context:
headers
传递自己的标头。
不适用
context:
http_method
如果您希望通过 网络爬虫 API 程序向您的目标 URL 发出 POST 请求,则可以将其设置为 post。
get
context:
session_id
I如果要在多个请求中使用同一个代理,则可以通过使用该参数来实现。只要将您的会话设置为您想要的任何字符串,我们就会为该 ID 分配一个代理,并最长保留 10 分钟。之后,如果使用相同的会话 ID 提出另一个请求,我们将为该特定会话 ID 分配一个新代理。
不适用
context:
successful_status_codes
定义一个或几个自定义的 HTTP 响应代码,我们将根据此代码确定爬取是否成功,并将相关内容返回给您。如果您希望我们返回 503 错误页面,则该参数可能很有用,同时也适用于其他一些非标准的情况。
不适用
- 必须提供的参数
代码示例
在这个示例中,API 将检索一个电子商务产品页面。包括所有可用参数(尽管在同一个请求中并不总是必要的或兼容的),以便让您知道如何格式化您的请求:
{
    "source": "universal", 
    "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html", 
    "user_agent_type": "desktop",
    "geo_location": "United States",
    "context": [
        {
            "key": "headers", 
            "value": {
                "Accept-Language": "en-US", 
                "Content-Type": "application/octet-stream", 
                "Custom-Header-Name": "custom header content"
                }
        }, 
        {
            "key": "cookies", 
            "value": [
                {
                    "key": "NID", 
                    "value": "1234567890"
                },
                {
                    "key": "1P JAR",
                    "value": "0987654321"
                }]
        },
        {
            "key": "follow_redirects",
            "value": true
        },
        {
            "key": "http_method", "value": "get"
        },
        {
            "key": "content",
            "value": "base64EncodedPOSTBody"
        },
        {
            "key": "successful_status_codes",
            "value": [303, 808, 909]
        }]
}curl --user user:pass1 \
'https://realtime.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{"source": "universal", "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html", "user_agent_type": "desktop","geo_location": "United States", "context": [{"key": "headers", "value": {"Accept-Language": "en-US", "Content-Type": "application/octet-stream", "Custom-Header": "custom header content"}}, {"key": "cookies", "value": [{"key": "NID", "value": "1234567890"}, {"key": "1P JAR", "value": "0987654321"}]}, {"key": "follow_redirects", "value": true}, {"key": "http_method", "value": "get"}, {"key": "content", "value": "abcd1234"}, {"key": "successful_status_codes", "value": [707, 808, 909]}]}'import requests
from pprint import pprint
# Structure payload.
payload = {
    'source': 'universal_ecommerce',
    'url': 'https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
    'user_agent_type': 'desktop',
    'geo_location': 'United States',
    'context': [
        {
          'key': 'session_id',
          'value': '1234567890abcdef'
        }
        {
          'key': 'headers', 'value': 
            {
             'Accept-Language': 'en-US',
             'Content-Type': 'application/octet-stream',
             'Custom-Header': 'custom header content'
            }
        },
        {
          'key': 'cookies',
          'value': [{
              'key': 'NID',
             'value': '1234567890'
           },
           {
              'key': '1P_JAR',
             'value': '0987654321'
           }
         ]
        },
        {
          'key': 'follow_redirects',
          'value': true
        },
        {
          'key': 'successful_status_codes',
          'value': [303, 808, 909]
        },
        {
          'key': 'http_method',
          'value': 'get'
        }
        {
          'key': 'content'
          'value': 'base64EncodedPOSTBody'
        }
    ],
}
# Get response.
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('user', 'pass1'),
    json=payload,
)
# Instead of response with job status and results url, this will return the
# JSON response with the result.
pprint(response.json())<?php
$params = [
    'source' => 'universal_ecommerce',
    'url' => 'https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
    'geo_location' => 'United States',
    'context' => [
        [
            'key' => 'session_id',
            'value' => '1234567890abcdef'
        ],
        [
            'key' => 'headers',
            'value' => [
                'Accept-Language' => 'en-US',
                'Content-Type' => 'application/octet-stream',
                'Custom-Header' => 'custom header content'
            ],
        ],
        [
            'key' => 'cookies',
            'value' => [
                ['key' => 'NID', 'value' => '1234567890'],
                ['key' => '1P_JAR', 'value' => '0987654321']
            ]
        ],
        [
            'key' => 'follow_redirects',
            'value' => 'true'
        ],
        [
            'key' => 'successful_status_codes',
            'value' => [303, 808, 909]
        ],
        [
            'key' => 'http_method',
            'value' => 'get'
        ],
        [
            'key' => 'content',
            'value' => 'base64EncodedPOSTBody'
        ]
    ]
];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://data.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");
$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
    echo 'Error:' . curl_error($ch);
}
curl_close ($ch);
?># The whole string you submit has to be URL-encoded.
https://realtime.oxylabs.io/v1/queries?source=universal_ecommerce&url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2Ftagged%2Fpython&access_token=12345abcde以上示例使用了 Realtime 集成方法。如果您想在您的查询中使用一些其他集成方法(如推拉或代理端点),请参考集成方法部分。
Last updated
Was this helpful?

