Push-Pull
了解 Oxylabs 网页爬虫API 的 Push-Pull 集成方式。提交任务后,稍后使用 JSON 格式数据轮询结果端点。
单任务
端点
POST https://data.oxylabs.io/v1/queries输入
curl --user "user:pass1" \
'https://data.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{"source": "ENTER_SOURCE_HERE", "url": "https://www.example.com", "geo_location": "United States", "callback_url": "https://your.callback.url", "storage_type": "s3", "storage_url": "s3://your.storage.bucket.url"}'import requests
from pprint import pprint
# Structure payload.
payload = {
"source": "ENTER_SOURCE_HERE", # Source you choose e.g. "universal"
"url": "https://www.example.com", # Check speficic source if you should use "url" or "query"
"geo_location": "United States", # Some sources accept zip-code or cooprdinates
#"render" : "html", # Uncomment you want to render JavaScript within the page
#"render" : "png", # Uncomment if you want to take a screenshot of a scraped web page
#"parse" : true, # Check what sources support parsed data
#"callback_url": "https://your.callback.url", #required if using callback listener
"callback_url": "https://your.callback.url",
"storage_type": "s3",
"storage_url": "s3://your.storage.bucket.url"
}
# Get response.
response = requests.request(
'POST',
'https://data.oxylabs.io/v1/queries',
auth=('YOUR_USERNAME', 'YOUR_PASSWORD'), #Your credentials go here
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())输出
数据字典
键
描述
类型
回调
输入
输出
检查任务状态
端点
输入
输出
状态值
参数
描述
获取任务内容
端点
输入
输出
Render 参数
Parse 参数
XHR 参数
默认输出
可用输出
批量查询
端点
输入
输出
获取通知器 IP 地址列表
端点
输入
输出
任务管家
Scheduler最后更新于
这有帮助吗?

