import requestsfrom pprint import pprint# Structure payload.payload ={'source':'universal','url':'https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas'}# Get response.response = requests.request('POST','https://realtime.oxylabs.io/v1/queries', auth=('USERNAME', 'PASSWORD'), json=payload,)# Instead of response with job status and results url, this will return the# JSON response with results.pprint(response.json())
# URL has to be encoded to escape `&` and `=` characters:# URL: http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&ch=&tn=baidu&bar=&wd=adidas# Encoded URL: http%3A%2F%2Fwww.baidu.com%2Fs%3Fie%3Dutf-8%26f%3D8%26rsv_bp%3D1%26rsv_idx%3D1%26ch%3D%26tn%3Dbaidu%26bar%3D%26wd%3Dadidas
https://realtime.oxylabs.io/v1/queries?source=universal&url=https%3A%2F%2Fwww.baidu.com%2Fs%3Fie%3Dutf-8%26f%3D8%26rsv_bp%3D1%26rsv_idx%3D1%26ch%3D%26tn%3Dbaidu&bar=&wd=adidas&access_token=12345abcde
Geo location of proxy used to retrieve the data. The complete list of the supported locations can be found here.
-
render
html will enable JavaScript rendering. More info. NOTE: If you are observing low success rates or retrieve empty content, please try adding this parameter.
-
browser_instructions
Define your own browser instructions that are executed when rendering JavaScript. More info.
-
parse
true will return structured data as long as you define parsing_instructions.
false
parsing_instructions
Define your own parsing and data transformation logic that will be executed on an HTML scraping result. Read more: Parsing instructions examples.
If you want to use the same proxy with multiple requests, you can do so by using this parameter. Just set your session to any string you like, and we will assign a proxy to this ID, and keep it for up to 10 minutes. After that, if you make another request with the same session ID, a new proxy will be assigned to that particular session ID.
-
user_agent_type
Device type and browser. The full list can be found here.
desktop
context:
http_method
Set it to post if you would like to make a POST request to your target URL. Learn more here.
get
context:
content
Base64-encoded POST request body. It is only useful if http_method is set to post. Learn more here.
-
content_encoding
Set to base64 if you want to download images. Learn more here.
-
context:
follow_redirects
Set to true to enable scraper to follow redirects. By default, redirects are followed up to a limit of 10 links, treating the entire chain as one scraping job.
-
context:
successful_status_codes
Define a custom HTTP response code (or a few of them), upon which we should consider the scrape successful and return the content to you. May be useful if you want us to return the 503 error page or in some other non-standard cases.
-
Payload sample
In this sample, we include some parameters from the table above. Note that while these parameters are not always necessary or compatible within the same request, they illustrate how to format your requests.