Documentation has been updated: see help center and changelog in one place.

Parser Presets

Learn how parser presets work and how to use them in your scraping jobs.

You can save, reuse, and modify custom parsing instructions through the Web Scraper API. Once you create a parser preset, we'll host it on our system, enabling you to reference it in your scraping jobs via the parser_preset parameter in the payload.

This feature offers several key capabilities:

  • Save and manage your own parsers on our system

  • Easily reuse presets across multiple scraping jobs

  • Create, retrieve, update, delete, and list all presets

  • Access performance and usage statistics of a preset

API reference

Endpoint: https://data.oxylabs.io/v1/parsers/presets

The table lists each available operation and its endpoint path:

Action
Request Method
Path

Create a preset

POST

/v1/parsers/presets

Retrieve a preset

GET

/v1/parsers/presets/{preset_name}

Update a preset

PUT

/v1/parsers/presets/{preset_name}

Delete a preset

DELETE

/v1/parsers/presets/{preset_name}

List all presets

GET

/v1/parsers/presets

View usage and performance statistics

GET

/v1/parsers/presets/{preset_name}/stats

Usage examples

Create a preset

Endpoint: POST https://data.oxylabs.io/v1/parsers

Payload:

{
    "name": "my_new_parser",
    "description": "Extract text from all H4 elements on the page.",
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": ["//h4/text()"],
                    "_fn": "xpath"
                }
            ]
        }
    }
}
Output
{
    "created_at": "2025-08-04 11:42:43",
    "description": "Extract text from all H4 elements on the page.",
    "id": 421386,
    "name": "my_new_parser",
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "updated_at": "2025-08-04 11:42:43",
    "urls": []
}

Use a preset

Endpoint: POST https://realtime.oxylabs.io/v1/queries

Payload:

{
    "source": "universal",
    "url": "https://sandbox.oxylabs.io/products",
    "parse": true,
    "parser_preset": "my_new_parser"
}
Output
{
  "results": [
    {
      "content": {
        "titles": [
          "The Legend of Zelda: Ocarina of Time",
          "Super Mario Galaxy",
          "Super Mario Galaxy 2",
          "Metroid Prime",
          "Super Mario Odyssey",
          "Halo: Combat Evolved",
          "The House in Fata Morgana - Dreams of the Revenants Edition -",
          "NFL 2K1",
          "Uncharted 2: Among Thieves",
          "Tekken 3",
          "The Legend of Zelda: The Wind Waker",
          "Gran Turismo",
          "Metal Gear Solid 2: Sons of Liberty",
          "Grand Theft Auto Double Pack",
          "Baldur's Gate II: Shadows of Amn",
          "Tetris Effect: Connected",
          "The Legend of Zelda Collector's Edition",
          "Gran Turismo 3: A-Spec",
          "The Legend of Zelda: A Link to the Past",
          "The Legend of Zelda: Majora's Mask",
          "The Last of Us",
          "Persona 5 Royal",
          "The Last of Us Remastered",
          "The Legend of Zelda: Ocarina of Time 3D",
          "Chrono Cross",
          "Gears of War",
          "Sid Meier's Civilization II",
          "Halo 3",
          "Ninja Gaiden Black",
          "Super Mario Advance 4: Super Mario Bros. 3",
          "Jet Grind Radio",
          "Grim Fandango"
        ],
        "parse_status_code": 12000
      },
      "created_at": "2025-08-04 11:48:51",
      "updated_at": "2025-08-04 11:48:51",
      "page": 1,
      "url": "https://sandbox.oxylabs.io/products",
      "job_id": "7358101611849202690",
      "is_render_forced": false,
      "status_code": 200,
      "parser_type": "preset",
      "parser_preset": "my_new_parser",
      "_request": {
        "cookies": [],
        "headers": {
          "Te": "trailers",
          "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0",
          "Sec-Fetch-Dest": "document",
          "Sec-Fetch-Mode": "navigate",
          "Sec-Fetch-Site": "none",
          "Sec-Fetch-User": "?1",
          "Accept-Encoding": "gzip, deflate, br",
          "Accept-Language": "en-US,en;q=0.5",
          "Upgrade-Insecure-Requests": "1"
        }
      },
      "_response": {
        "cookies": [],
        "headers": {
          "via": "1.1 google",
          "date": "Mon, 04 Aug 2025 11:48:51 GMT",
          "vary": "Accept-Encoding",
          "cf-ray": "969dd3fccff1c9d4-OTP",
          "server": "cloudflare",
          "alt-svc": "h3=\":443\"; ma=86400",
          "content-type": "text/html; charset=utf-8",
          "x-powered-by": "Next.js",
          "cache-control": "private, no-cache, no-store, max-age=0, must-revalidate",
          "server-timing": "cfOrigin;dur=407,cfEdge;dur=95",
          "cf-cache-status": "DYNAMIC",
          "content-encoding": "br"
        }
      },
      "session_info": {
        "id": null,
        "expires_at": null,
        "remaining": null
      }
    }
  ],
  "job": {
    "callback_url": null,
    "client_id": 13159,
    "context": [
      {
        "key": "force_headers",
        "value": false
      },
      {
        "key": "force_cookies",
        "value": false
      },
      {
        "key": "hc_policy",
        "value": true
      },
      {
        "key": "successful_status_codes",
        "value": []
      },
      {
        "key": "follow_redirects",
        "value": null
      },
      {
        "key": "cookies",
        "value": []
      },
      {
        "key": "headers",
        "value": []
      },
      {
        "key": "session_id",
        "value": null
      },
      {
        "key": "http_method",
        "value": "get"
      },
      {
        "key": "content",
        "value": null
      },
      {
        "key": "store_id",
        "value": null
      },
      {
        "key": "proxy_location",
        "value": null
      },
      {
        "key": "delivery_location",
        "value": null
      },
      {
        "key": "fulfillment_type",
        "value": null
      }
    ],
    "created_at": "2025-08-04 11:48:51",
    "domain": "io",
    "geo_location": null,
    "id": "7358101611849202690",
    "limit": 10,
    "locale": null,
    "pages": 1,
    "parse": true,
    "parser_type": "preset",
    "parser_preset": "my_new_parser",
    "parsing_instructions": null,
    "browser_instructions": null,
    "render": null,
    "xhr": false,
    "markdown": false,
    "url": "https://sandbox.oxylabs.io/products",
    "query": "",
    "source": "universal",
    "start_page": 1,
    "status": "done",
    "storage_type": null,
    "storage_url": null,
    "subdomain": "sandbox",
    "content_encoding": "utf-8",
    "updated_at": "2025-08-04 11:48:51",
    "user_agent_type": "desktop",
    "session_info": null,
    "statuses": [],
    "client_notes": null,
    "_links": [
      {
        "rel": "self",
        "href": "http://data.oxylabs.io/v1/queries/7358101611849202690",
        "method": "GET"
      },
      {
        "rel": "results",
        "href": "http://data.oxylabs.io/v1/queries/7358101611849202690/results",
        "method": "GET"
      },
      {
        "rel": "results-content",
        "href_list": [
          "http://data.oxylabs.io/v1/queries/7358101611849202690/results/1/content"
        ],
        "method": "GET"
      },
      {
        "rel": "results-html",
        "href": "http://data.oxylabs.io/v1/queries/7358101611849202690/results?type=raw",
        "method": "GET"
      },
      {
        "rel": "results-content-html",
        "href_list": [
          "http://data.oxylabs.io/v1/queries/7358101611849202690/results/1/content?type=raw"
        ],
        "method": "GET"
      },
      {
        "rel": "results-parsed",
        "href": "http://data.oxylabs.io/v1/queries/7358101611849202690/results?type=parsed",
        "method": "GET"
      },
      {
        "rel": "results-content-parsed",
        "href_list": [
          "http://data.oxylabs.io/v1/queries/7358101611849202690/results/1/content?type=parsed"
        ],
        "method": "GET"
      }
    ]
  }
}

Retrieve a preset

Endpoint: GET https://data.oxylabs.io/v1/parsers/presets/{preset_name}

Output
{
    "created_at": "2025-08-04 11:42:43",
    "description": "Extract text from all H4 elements on the page.",
    "id": 421386,
    "name": "my_new_parser",
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "updated_at": "2025-08-04 11:42:43",
    "urls": []
}

Update a preset

Endpoint: PUT https://data.oxylabs.io/v1/parsers/presets/{preset_name}

Define the fields of the preset you want to update. In the following example, only the parsing_instructions will get updated.

Payload:

{
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": ["//h4/text()"],
                    "_fn": "xpath"
                }
            ]
        },
        "prices": {
            "_fns": [
                {
                    "_args": [".price-wrapper"],
                    "_fn": "css"
                },
                {"_fn": "element_text"}
            ]
        }
    }
}
Output
{
    "created_at": "2025-08-04 11:42:43",
    "description": "Extract text from all H4 elements on the page.",
    "id": 421386,
    "name": "my_new_parser",
    "parsing_instructions": {
        "prices": {
            "_fns": [
                {
                    "_args": [
                        ".price-wrapper"
                    ],
                    "_fn": "css"
                },
                {
                    "_fn": "element_text"
                }
            ]
        },
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "updated_at": "2025-08-04 11:45:04",
    "urls": []
}

Delete a preset

Endpoint: DELETE https://data.oxylabs.io/v1/parsers/presets/{preset_name}

List all presets

Endpoint: GET https://data.oxylabs.io/v1/parsers/presets

Output
[
    {
        "created_at": "2025-08-04 11:42:43",
        "description": "Extract text from all H4 elements on the page.",
        "id": 421386,
        "name": "my_new_parser",
        "parsing_instructions": {
            "prices": {
                "_fns": [
                    {
                        "_args": [
                            ".price-wrapper"
                        ],
                        "_fn": "css"
                    },
                    {
                        "_fn": "element_text"
                    }
                ]
            },
            "titles": {
                "_fns": [
                    {
                        "_args": [
                            "//h4/text()"
                        ],
                        "_fn": "xpath"
                    }
                ]
            }
        },
        "updated_at": "2025-08-04 11:45:04",
        "urls": []
    },
    {
        "created_at": "2025-08-04 09:58:58",
        "description": "Parses all book titles on the page.",
        "id": 421379,
        "name": "books_parser",
        "parsing_instructions": {
            "titles": {
                "_fns": [
                    {
                        "_args": [
                            "//h3//text()"
                        ],
                        "_fn": "xpath"
                    }
                ]
            }
        },
        "updated_at": "2025-08-04 09:58:58",
        "urls": []
    }
]

View statistics

Endpoint: GET https://data.oxylabs.io/v1/parsers/presets/{preset_name}/stats

Output
{
    "success_rate": 100,
    "success_rate_by_path": {
        "prices": 100,
        "titles": 100
    },
    "successful_results": 7,
    "total_results": 7
}

You can filter results by date and time using the date_from and/or date_to URL parameters. Use the format YYYY-MM-DDTHH, where T indicates the time, and HH is the hour in a 24-hour format.

For example, to get statistics from 9 AM to 2 PM on August 5, 2025:

https://data.oxylabs.io/v1/parsers/presets/{preset_name}/stats?date_from=2025-08-05T9&date_to=2025-08-05T14

Last updated

Was this helpful?