Documentation has been updated: see help center and changelog in one place.

Parser Presets

Learn how parser presets work and how to use them in your scraping jobs.

You can save, reuse, and modify custom parsing instructions through the Web Scraper API. Once you create a parser preset, we'll host it on our system, enabling you to reference it in your scraping jobs via the parser_preset parameter in the payload.

This feature offers several key capabilities:

  • Save and manage your own parsers on our system

  • Easily reuse presets across multiple scraping jobs

  • Create, retrieve, update, delete, and list all presets

  • Access performance and usage statistics of a preset

  • Adapt to changing sites using self-healing presets

API reference

Endpoint: https://data.oxylabs.io/v1/parsers/presets

The table lists each available operation and its endpoint path:

Action
Request Method
Path

Create a preset

POST

/v1/parsers/presets

Retrieve a preset

GET

/v1/parsers/presets/{preset_name}

Update a preset

PUT

/v1/parsers/presets/{preset_name}

Delete a preset

DELETE

/v1/parsers/presets/{preset_name}

List all presets

GET

/v1/parsers/presets

View usage and performance statistics

GET

/v1/parsers/presets/{preset_name}/stats

Track self-healing changes

GET

/v1/parsers/presets/{parser_name}/changelog

Enable self-healing

Parser presets are equipped with the self-healing function, which helps maintain the parsers and their success rates as websites change. When enabled, parser presets automatically repair themselves and adjust parsing instructions in the background with no additional manual input.

To enable self-healing for your custom parser preset, include the following mandatory parameters when creating or updating a preset:

Parameter
Description

self_heal

Turns on the self-healing functionality when set to True.

prompt_schema

A JSON schema describing the required parser output. You can automatically create the schema when generating parsers with API.

urls

A list of up to 5 URLs of the same page type. We recommend providing 3-5 URLs to help the parser adapt to different layouts and improve parsing accuracy.

Payload sample

The payload example shown here enables self-healing by updating an existing preset.

Endpoint: PUT https://data.oxylabs.io/v1/parsers/presets/{preset_name}

{
    "self_heal": true,
    "urls": ["https://sandbox.oxylabs.io/products"],
    "prompt_schema": {
        "properties": {
            "product_titles": {
                "description": "Title of each product.",
                "items": {
                    "type": "string"
                },
                "maxItems": 5,
                "title": "Product Titles",
                "type": "array"
            }
        },
        "required": [
            "product_titles"
        ],
        "title": "Fields",
        "type": "object"
    }
}

Usage examples

Create a preset

Endpoint: POST https://data.oxylabs.io/v1/parsers/presets

Payload:

{
    "name": "my_new_parser",
    "description": "Extract text from all H4 elements on the page.",
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": ["//h4/text()"],
                    "_fn": "xpath"
                }
            ]
        }
    }
}
Output
{
    "id": 421947,
    "name": "my_new_parser",
    "description": "Extract text from all H4 elements on the page.",
    "prompt_text": null,
    "prompt_schema": null,
    "urls": [],
    "render": false,
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "self_heal": false,
    "heal_status": "disabled",
    "last_healed_at": null,
    "created_at": "2025-10-27 11:40:22",
    "updated_at": "2025-10-27 11:40:22"
}

Use a preset

Endpoint: POST https://realtime.oxylabs.io/v1/queries

Payload:

{
    "source": "universal",
    "url": "https://sandbox.oxylabs.io/products",
    "parse": true,
    "parser_preset": "my_new_parser"
}
Output
{
    "results": [
        {
            "content": {
                "titles": [
                    "The Legend of Zelda: Ocarina of Time",
                    "Super Mario Galaxy",
                    "Super Mario Galaxy 2",
                    "Metroid Prime",
                    "Super Mario Odyssey",
                    "Halo: Combat Evolved",
                    "The House in Fata Morgana - Dreams of the Revenants Edition -",
                    "NFL 2K1",
                    "Uncharted 2: Among Thieves",
                    "Tekken 3",
                    "The Legend of Zelda: The Wind Waker",
                    "Gran Turismo",
                    "Metal Gear Solid 2: Sons of Liberty",
                    "Grand Theft Auto Double Pack",
                    "Baldur's Gate II: Shadows of Amn",
                    "Tetris Effect: Connected",
                    "The Legend of Zelda Collector's Edition",
                    "Gran Turismo 3: A-Spec",
                    "The Legend of Zelda: A Link to the Past",
                    "The Legend of Zelda: Majora's Mask",
                    "The Last of Us",
                    "Persona 5 Royal",
                    "The Last of Us Remastered",
                    "The Legend of Zelda: Ocarina of Time 3D",
                    "Chrono Cross",
                    "Gears of War",
                    "Sid Meier's Civilization II",
                    "Halo 3",
                    "Ninja Gaiden Black",
                    "Super Mario Advance 4: Super Mario Bros. 3",
                    "Jet Grind Radio",
                    "Grim Fandango"
                ],
                "parse_status_code": 12000
            },
            "created_at": "2025-10-27 11:41:18",
            "updated_at": "2025-10-27 11:41:19",
            "page": 1,
            "url": "https://sandbox.oxylabs.io/products",
            "job_id": "7388540292158203905",
            "is_render_forced": false,
            "status_code": 200,
            "type": "parsed",
            "parser_type": "preset",
            "parser_preset": "my_new_parser"
        }
    ]
}

Retrieve a preset

Endpoint: GET https://data.oxylabs.io/v1/parsers/presets/{preset_name}

Output
{
    "id": 421947,
    "name": "my_new_parser",
    "description": "Extract text from all H4 elements on the page.",
    "prompt_text": null,
    "prompt_schema": null,
    "urls": [],
    "render": false,
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "self_heal": false,
    "heal_status": "disabled",
    "last_healed_at": null,
    "created_at": "2025-10-27 11:40:22",
    "updated_at": "2025-10-27 11:40:22"
}

Update a preset

Endpoint: PUT https://data.oxylabs.io/v1/parsers/presets/{preset_name}

Define the fields of the preset you want to update. In the following example, only the parsing_instructions will get updated.

Payload:

{
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": ["//h4/text()"],
                    "_fn": "xpath"
                }
            ]
        },
        "prices": {
            "_fns": [
                {
                    "_args": [".price-wrapper"],
                    "_fn": "css"
                },
                {"_fn": "element_text"}
            ]
        }
    }
}
Output
{
    "id": 421947,
    "name": "my_new_parser",
    "description": "Extract text from all H4 elements on the page.",
    "prompt_text": null,
    "prompt_schema": null,
    "urls": [],
    "render": false,
    "parsing_instructions": {
        "prices": {
            "_fns": [
                {
                    "_args": [
                        ".price-wrapper"
                    ],
                    "_fn": "css"
                },
                {
                    "_fn": "element_text"
                }
            ]
        },
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "self_heal": false,
    "heal_status": "disabled",
    "last_healed_at": null,
    "created_at": "2025-10-27 11:40:22",
    "updated_at": "2025-10-27 11:44:24"
}

Delete a preset

Endpoint: DELETE https://data.oxylabs.io/v1/parsers/presets/{preset_name}

List all presets

Endpoint: GET https://data.oxylabs.io/v1/parsers/presets

Output
[
    {
        "id": 421950,
        "name": "books_parser",
        "description": "Parses all book titles on the page.",
        "prompt_text": null,
        "prompt_schema": null,
        "urls": [],
        "render": false,
        "parsing_instructions": {
            "titles": {
                "_fns": [
                    {
                        "_args": [
                            "//h3//text()"
                        ],
                        "_fn": "xpath"
                    }
                ]
            }
        },
        "self_heal": false,
        "heal_status": "disabled",
        "last_healed_at": null,
        "created_at": "2025-10-27 11:46:59",
        "updated_at": "2025-10-27 11:46:59"
    },
    {
        "id": 421947,
        "name": "my_new_parser",
        "description": "Extract text from all H4 elements on the page.",
        "prompt_text": null,
        "prompt_schema": null,
        "urls": [],
        "render": false,
        "parsing_instructions": {
            "titles": {
                "_fns": [
                    {
                        "_args": [
                            "//h4/text()"
                        ],
                        "_fn": "xpath"
                    }
                ]
            }
        },
        "self_heal": false,
        "heal_status": "disabled",
        "last_healed_at": null,
        "created_at": "2025-10-27 11:40:22",
        "updated_at": "2025-10-27 11:45:20"
    }
]

View statistics

Endpoint: GET https://data.oxylabs.io/v1/parsers/presets/{preset_name}/stats

Output
{
    "total_results": 9,
    "successful_results": 9,
    "success_rate": 100,
    "success_rate_by_path": {
        "titles": 100
    }
}

You can filter results by date and time using the date_from and/or date_to URL parameters. Use the format YYYY-MM-DDTHH, where T indicates the time, and HH is the hour in a 24-hour format.

For example, to get statistics from 9 AM to 2 PM on August 5, 2025:

https://data.oxylabs.io/v1/parsers/presets/{preset_name}/stats?date_from=2025-08-05T9&date_to=2025-08-05T14

Track self-healing changes

Endpoint: GET https://data.oxylabs.io/v1/parsers/presets/{preset_name}/changelog

Our system automatically logs self-healing activity. You can access this historical log to review all modifications made by the self-healing function.

Last updated

Was this helpful?