Documentation has been updated: see help center and changelog in one place.

Generating parsing instructions via API

You can generate parsing instruction sets via API by providing URLs and describing what data points you would like to parse. Upon receiving the generated parsing instructions, you may save them as a parser preset or simply send the instructions with your scraping request.

You can also generate parsing instructions via OxyCopilot on our API playground.

Generate instructions from prompt

You can generate parsing instructions by inputting a free-text description of the data points you would like to parse and giving us a few URLs that belong to the same page type. The API will respond with a set of parsing instructions.

  • Endpoint: https://data.oxylabs.io/v1/parsers/generate-instructions/prompt

  • Method: POST

  • Authentication: Basic

  • Request headers: Content-Type: application/json

Sample payload

{ 
  "prompt_text": "Parse title of the product, main price, developer name and platform name.",
  "urls": [
    "https://sandbox.oxylabs.io/products/1",
    "https://sandbox.oxylabs.io/products/2",
    "https://sandbox.oxylabs.io/products/4"
  ],
  "render": false
}
Parameter
Description

prompt_text

Free-text description of the data points to be parsed.

urls

List of URLs exemplifying the page type you would like to get parsing instructions for.

render

Whether or not JS rendering should be used to fetch the required content.

- mandatory parameter

Sample response

{
    "parsing_instructions": {
        "developer_name": {
            "_fns": [
                {
                    "_args": [
                        "//div[contains(@class, 'brand-wrapper')]//span[@class='brand developer']"
                    ],
                    "_fn": "xpath"
                },
                {
                    "_args": [
                        "normalize-space(.)"
                    ],
                    "_fn": "xpath"
                },
                {
                    "_args": " ",
                    "_fn": "join"
                }
            ]
        },
        "main_price": {
            "_fns": [
                {
                    "_args": [
                        "//div[contains(@class, 'product-info-wrapper')]//div[contains(@class, 'price')]/text()"
                    ],
                    "_fn": "xpath_one"
                },
                {
                    "_fn": "amount_from_string"
                }
            ]
        },
        "title": {
            "_fns": [
                {
                    "_args": [
                        "//div[contains(@class, 'product-info-wrapper')]//h2/text()"
                    ],
                    "_fn": "xpath_one"
                },
                {
                    "_args": [
                        "^\\s*(.[\\s\\S]*?)\\s*$",
                        1
                    ],
                    "_fn": "regex_search"
                }
            ]
        }
    },
    "prompt_schema": {
        "properties": {
            "developer_name": {
                "description": "Developer name.",
                "title": "Developer Name",
                "type": "string"
            },
            "main_price": {
                "description": "Main price of the product.",
                "title": "Main Price",
                "type": "number"
            },
            "platform_name": {
                "description": "Platform name.",
                "title": "Platform Name",
                "type": "string"
            },
            "title": {
                "description": "Title of the product.",
                "title": "Title",
                "type": "string"
            }
        },
        "required": [
            "title",
            "main_price",
            "developer_name",
            "platform_name"
        ],
        "title": "Fields",
        "type": "object"
    }
}

Generate instructions from JSON schema

There are cases where you want to get parsed data in a specific JSON schema. You can use this endpoint to get parsing instructions that stricly adhere to the schema you provide.

  • Endpoint: https://data.oxylabs.io/v1/parsers/generate-instructions/schema

  • Method: POST

  • Authentication: Basic

  • Request headers: Content-Type: application/json

Sample payload

{
  "urls": [
    "https://oxylabs.io",
    "https://example.com",
    "https://bbc.co.uk"
  ],
  "prompt_schema": {
    "properties": {
      "links": {
        "description": "An array of URL strings",
        "type": "array",
        "items": {
          "type": "string",
          "description": "A URL"
        }
      }
    },
    "required": [
      "links"
    ]
  },
  "render": false
}
Parameter
Description

prompt_schema

JSON schema describing the required parser output.

urls

List of URLs exemplifying the page type you would like to get parsing instructions for.

render

Whether or not JS rendering should be used to fetch the required content.

- mandatory parameter

Sample response

{
    "parsing_instructions": {
            "links": {
                "_fns": [
                    {
                        "_args": [
                            "//a[@href and normalize-space(@href) != '']/@href"
                        ],
                        "_fn": "xpath"
                    }
                ]
            }
        }
}

Last updated

Was this helpful?