Browser instructions (Beta)

When using Headless Browser, you can define your own browser instructions that are executed when rendering JavaScript.

How to use it?

To use browser instructions, provide a set of browser_instructions when creating a job.

Let’s say you want to search for the term pizza boxes in a website.

An example job parameters would look as follows:

curl -k -x unblock.oxylabs.io:60000 \
-U USERNAME:PASSWORD \
"https://www.ebay.com" \
-H "x-oxylabs-render: html" \
-H "x-oxylabs-browser-instructions: [{\"type\":\"input\",\"value\":\"pizza boxes\",\"selector\":{\"type\":\"xpath\",\"value\":\"\/\/input[@class='gh-tb ui-autocomplete-input']\"}},{\"type\":\"click\",\"selector\":{\"type\":\"xpath\",\"value\":\"\/\/input[@type='submit']\"}},{\"type\":\"wait\",\"wait_time_s\":5}]"

Step 1. You must provide the x-oxylabs-render: html parameter.

Step 2. Browser instructions should be described in the x-oxylabs-browser_instructions field.

The browser instructions provided as the header value must be JSON-escaped and contain no extra spaces.

The sample browser instructions above specifies that the aim is to enter a search term pizza boxes into a search field, click search button and wait for 5 seconds for content to load.

The scraped result should look as follows:

<!doctype html><html>
Content after executing the instructions      
</html>

Scraped HTML should look like this:

Fetching browser resources

We provide a standalone browser instruction for fetching browser resources.

The function is defined here:

Using fetch_resource will result in job returning the first occurrence of a Fetch/XHR resource that matches the format provided instead of the HTML that is being targeted.

Let’s say we want to target a GraphQL resource that is fetched when visiting a product page organically in the browser. We will provide job information as such:

curl -k -x unblock.oxylabs.io:60000 \
-U USERNAME:PASSWORD \
"https://www.example.com/product-page/123" \
-H "x-oxylabs-render: html" \
-H "x-oxylabs-browser-instructions: [{\"type\": \"fetch_resource\",\"filter\": \"\/graphql\/product-info\/123\"}]"

These instructions will result in a result as such:

{"product_id": 123, "description": "", "price": 456}

List of supported browser instructions

Status codes

See our response codes outlined here.

Status codes in regards to instructions validation are documented here.

Last updated