If the page you wish to scrape requires JavaScript to dynamically load all necessary data into the DOM, you can include a render parameter in your requests instead of setting up and using a headless browser manually. Requests with this parameter will be fully rendered, and the data will be stored in either an HTML file or a PNG screenshot, depending on the specified parameter.
Parameter Values
HTML
Set the render parameter to html to get the raw output of the rendered page.
PNG (Screenshot)
Set the render parameter to png to get a Base64-encoded screenshot of the rendered page.
If you want to scrape an image and download it, please refer to Download Images section.
import requestsfrom pprint import pprint# Structure payload.payload ={'source':'google','url':'https://www.example.com','render':'html',}# Get response.response = requests.request('POST','https://realtime.oxylabs.io/v1/queries', auth=('user', 'pass1'), json=payload,)# Instead of response with job status and results url, this will return the# JSON response with the result.pprint(response.json())
# The whole string you submit has to be URL-encoded.https://realtime.oxylabs.io/v1/queries?source=google&url=https%3A%2F%2Fwww.example.com%2F&render=html&access_token=12345abcde
JavaScript rendering takes more time to scrape the page. Please set timeout on the client side to 180 seconds if using Realtime or Proxy Endpoint integration methods.
To ensure lowest traffic consumption, our system does not load unnecessary assets during page rendering.
Browser instructions
With our Headless Browser you can also execute various browser instructionssuch as clicking, scrolling, inputting, waiting and more. Read more: