Browser instructions (Beta)

When using Headless Browser, you can define your own browser instructions that are executed when rendering JavaScript.

How to use it?

To use browser instructions, provide a set of browser_instructions when creating a job.

Let’s say you want to search for the term pizza boxes in a website.

An example job parameters would look as follows:

{
    "source": "universal",
    "url": "https://www.ebay.com/",
    "render": "html",
    "browser_instructions": [
        {
            "type": "input",
            "value": "pizza boxes",
            "selector": {
                "type": "xpath",
                "value": "//input[@class='gh-tb ui-autocomplete-input']"
            }
        },
        {
            "type": "click",
            "selector": {
                "type": "xpath",
                "value": "//input[@type='submit']"
            }
        },
        {
            "type": "wait",
            "wait_time_s": 5
        }
    ]
}

Step 1. You must provide the "render": "html" parameter.

Step 2. Browser instructions should be described in the "browser_instructions" field.

The sample browser instructions above specifies that the aim is to enter a search term pizza boxes into a search field, click search button and wait for 5 seconds for content to load.

The scraped result should look as follows:

{
  "results": [
    {
      "content": "<!doctype html><html>
        Content after executing the instructions      
      </html>",
      "created_at": "2023-10-11 11:35:23",
      "updated_at": "2023-10-11 11:36:08",
      "page": 1,
      "url": "https://www.ebay.com/",
      "job_id": "7117835067442906113",
      "status_code": 200
    }
  ]
}

Scraped HTML should look like this:

Fetching browser resources

We provide a standalone browser instruction for fetching browser resources.

The function is defined here:

Using fetch_resource will result in job returning the first occurrence of a Fetch/XHR resource that matches the format provided instead of the HTML that is being targeted.

Let’s say we want to target a GraphQL resource that is fetched when visiting a product page organically in the browser. We will provide job information as such:

{
    "source": "universal",
    "url": "https://www.example.com/product-page/123",
    "render": "html",
    "browser_instructions": [
        {
            "type": "fetch_resource",
            "format": "/graphql/product-info/123"
        }
    ]
}

These instructions will result in a result as such:

{
  "results": [
    {
      "content": "{'product_id': 123, 'description': '', 'price': 123}",
      "created_at": "2023-10-11 11:35:23",
      "updated_at": "2023-10-11 11:36:08",
      "page": 1,
      "url": "https://example.com/v1/graphql/product-info/123/",
      "job_id": "7117835067442906114",
      "status_code": 200
    }
  ]
}

List of supported browser instructions

pageList of instructions

Status codes

See our response codes outlined here.

Status codes in regards to instructions validation are documented here.

Errors and warnings

If there’s an error or warning resulting from your browsing actions, you’ll find it in the outcome under the keys browser_instructions_error or browser_instructions_warnings. For instance, if you’ve sent the following browser instructions and the expected xpath isn’t located on the page, the result will include a warning.

browser_instructions:

[
    {
        "type": "input", 
        "selector": {
            "type": "xpath",
            "value": "//input[@type='search']"
        },
        "value": "oxylabs"
    }
]

Results:

{
  "results": [
    {
      "content": "<!doctype html><html>
        Content after executing the instructions      
      </html>",
      "created_at": "2023-10-11 11:35:23",
      "updated_at": "2023-10-11 11:36:08",
      "browser_instructions_warnings": [
        {
          "action_type": "click",
          "msg": "Unable to find selector type `xpath` with value `//input[@type=search]` on the page."
        },
      ],
      "page": 1,
      "url": "https://example.com",
      "job_id": "7117835067442906113",
      "status_code": 200
    }
  ]
}
Possible errors and warnings

Unexpected error happened while converting browser instructions to actions.

Unexpected error happened while executing {action.type} browser instructions.

Action {action.type} timed out.

Unable to find selector type {selector.type} with value {selector.value} on the page.

Last updated