Custom Parser

Custom Parser is a free Scraper APIs feature that lets you define your own parsing and data processing logic that is executed on a raw scraping result.

You can use CSS and XPath selectors to select an object in the HTML DOM.

To use Custom Parser, just pass a JSON object with instructions while submitting a job:

If you are using XPath selectors:

{
  "source": "universal_ecommerce",
  "url": "https://example.com",
  "parse": true,
  "parsing_instructions": {
      "title": {
          "_fns": [
              {
                  "_fn": "xpath_one",
                  "_args": ["//h1/text()"]
              }
          ]
      }
  }
}

You can conveniently use the text() function with XPath, which extracts the text value of the selected node.

If you are using CSS selectors, you’ll have to string two functions together: the first one will select the h1 element, while the second one will extract its text:

{
    "source": "universal_ecommerce",
    "url": "https://example.com",
    "parse": true,
    "parsing_instructions": {
        "title": {
            "_fns": [
                {"_fn": "css_one", "_args": ["body > div:nth-child(1) > h1"]},
                {"_fn": "element_text"}
            ]
        }
    }
}

The result will look like this:

{
    "results": [
        {
            "content": {
                "title": "Example Domain",
                "parse_status_code": 12000
            },
            "created_at": "2023-03-23 14:47:49",
            "updated_at": "2023-03-23 14:47:58",
            "page": 1,
            "url": "https://example.com",
            "job_id": "7044681146457663489",
            "status_code": 200
        }
    ]
}

Check out the list of available parsing and data transformation functions here and some instruction examples here.

Getting the HTML content of a parsed job

You may retrieve the raw HTML result by adding ?type=raw to the end of the result retrieval URL. Read more here.

Last updated