JS Rendering & Browser Control

Learn how to use a render parameter how to define browser instructions in Web Scraper API so you can scrape complex dynamic pages.

JavaScript Rendering

If the page you wish to scrape requires JavaScript to dynamically load all necessary data into the DOM, you can include a render parameter in your requests instead of setting up and using custom browser instructions manually. Requests with this parameter will be fully rendered, and the data will be stored in either an HTML file or a PNG screenshot, depending on the specified parameter.

HTML

Set the render parameter to html to get the raw output of the rendered page.

PNG (Screenshot)

Set the render parameter to png to get a Base64-encoded screenshot of the rendered page.

circle-info

If you want to scrape an image and download it, please refer to this section.

Request sample

curl --user "user:pass" \
'https://realtime.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{"source": "universal", "url": "https://www.example.com", "render": "html"}'
circle-exclamation
circle-exclamation

Forcing rendering on specific pages

For successful scraping, some page types of specific domains require rendering due to their dynamic content. Our system automatically enforces rendering for these pages, even if not explicitly set by the user.

circle-exclamation

We want our users to be fully aware of this when scraping the following pages:

This approach provides the best possible scraping experience, ensuring data accuracy and reliability from these challenging pages.

If you wish to disable rendering, you can do so by adding the following parameter to your requests:

Browser instructions

You can define your own browser instructions that are executed when rendering JavaScript.

circle-check

Usage

To use browser instructions, provide a set of browser_instructions when creating a job.

Let’s say you want to search for the term pizza boxes in a website.

An example job parameters would look as follows:

Step 1. You must provide the "render": "html" parameter.

Step 2. Browser instructions should be described in the "browser_instructions" field.

The sample browser instructions above specifies that the aim is to enter a search term pizza boxes into a search field, click search button and wait for 5 seconds for content to load.

The scraped result should look as follows:

Scraped HTML should look like this:

Fetching browser resources

We provide a standalone browser instruction for fetching browser resources.

The function is defined here:

Using fetch_resource will result in job returning the first occurrence of a Fetch/XHR resource that matches the format provided instead of the HTML that is being targeted.

Let’s say we want to target a GraphQL resource that is fetched when visiting a product page organically in the browser. We will provide job information as such:

These instructions will result in a result as such:

List of supported browser instructions

General arguments

All the instructions defined below have a consistent set of arguments. The arguments are as follows.

type

  • Type: Enum["click", "input", "scroll", "scroll_to_bottom", "wait", "wait_for_element", "fetch_resource"]

  • Description: Browser instruction type.

timeout_s

  • Type: int

  • Description: How long until action is skipped if not completed in time.

  • Restrictions: 0 < timeout_s <= 60

  • Default value: 5

wait_time_s

  • Type: int

  • Description: How long to wait before executing next action.

  • Restrictions: 0 < wait_time_s <= 60

  • Default value: 0

on_error

  • Type: Enum["error", "skip"]

  • Description: Indicator what to do with instructions in case this instruction fails:

    • "error": Stops the execution of browser instructions.

    • "skip": Continues with the next instruction.

  • Default value: "error"

Example with general arguments

Instructions

click

  • Description: Clicks an element and wait a set count of seconds.

  • Args:

    • type: str = "click"

    • selector: dict

      • type: Enum["xpath", "css", "text"]

      • value: str

Example:

input

  • Description: Enters a text into a selected element.

  • Args:

    • type: str = "input"

    • selector: dict

      • type: Enum["xpath", "css", "text"]

      • value: str

    • value: str

Example:

scroll

  • Description: Scrolls a set count of pixels.

  • Args:

    • type: str = "scroll"

    • x: int

    • y: int

Example:

scroll_to_bottom

  • Description: Scrolls to bottom for a set count of seconds.

  • Args:

    • type: str = "scroll_to_bottom"

Example:

wait

  • Description: Waits a set count of seconds.

  • Args:

    • type: str = "wait"

Example:

wait_for_element

  • Description: Waits for element to load for a set count of seconds.

  • Args:

    • type: str = "wait_for_element"

    • selector: dict

      • type: Enum["xpath", "css", "text"]

      • value: str

Example:

fetch_resource

circle-exclamation
  • Description: Fetches the first occurrence of a Fetch/XHR resource matching the set pattern.

  • Args:

    • type: str = "fetch_resource"

    • filter: str(RegEx expression)

    • on_error: Enum["error", "skip"]

Example:

Instruction validation

Any inconsistency in regards to instruction format will result in a 400 status code and a corresponding error message.

For example, payload as such:

Will result in:

Troubleshooting

Status codes

See our response codes outlined here. Status codes in regards to instructions validation are documented here.

Errors and warnings

If there’s an error or warning resulting from your browsing actions, you’ll find it in the outcome under the keys browser_instructions_error or browser_instructions_warnings. For instance, if you’ve sent the following browser instructions and the expected xpath isn’t located on the page, the result will include a warning.

browser_instructions:

Results:

Possible errors and warnings

Unexpected error happened while converting browser instructions to actions.

Unexpected error happened while executing {action.type} browser instructions.

Action {action.type} timed out.

Unable to find selector type {selector.type} with value {selector.value} on the page.

Last updated

Was this helpful?