Documentation has been updated: see help center and changelog in one place.

Parser Presets

You can save, reuse, and modify custom parsing instructions through the Web Scraper API. Once you create a parser preset, we'll host it on our system, enabling you to reference it in your scraping jobs via the parser_preset parameter in the payload.

This feature offers several key capabilities:

  • Save and manage your own parsers on our system

  • Easily reuse presets across multiple scraping jobs

  • Create, retrieve, update, delete, and list all presets

  • Access performance and usage statistics of a preset

API reference

When creating and managing parser presets, you can use Realtime and Push-Pull integration methods:

Realtime integration endpoint

https://realtime.oxylabs.io/v1/parsers

Push-Pull integration endpoint

https://data.oxylabs.io/v1/parsers

The table lists each available operation and its endpoint path:

Action
Request Method
Path

Create a preset

POST

/v1/parsers

Retrieve a preset

GET

/v1/parsers/{preset_name}

Update a preset

PUT

/v1/parsers/{preset_name}

Delete a preset

DELETE

/v1/parsers/{preset_name}

List all presets

GET

/v1/parsers

View usage and performance statistics

GET

/v1/parsers/{preset_name}/stats

Usage examples

Create a preset

Endpoint: POST https://data.oxylabs.io/v1/parsers

Payload:

{
    "name": "my_new_parser",
    "description": "Extract text from all H4 elements on the page.",
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": ["//h4/text()"],
                    "_fn": "xpath"
                }
            ]
        }
    }
}
Output
{
    "created_at": "2025-08-04 11:42:43",
    "description": "Extract text from all H4 elements on the page.",
    "id": 421386,
    "name": "my_new_parser",
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "updated_at": "2025-08-04 11:42:43",
    "urls": []
}

Use a preset

Endpoint: POST https://realtime.oxylabs.io/v1/queries

Payload:

{
    "source": "universal",
    "url": "https://sandbox.oxylabs.io/products",
    "parse": true,
    "parser_preset": "my_new_parser"
}
Output
{
  "results": [
    {
      "content": {
        "titles": [
          "The Legend of Zelda: Ocarina of Time",
          "Super Mario Galaxy",
          "Super Mario Galaxy 2",
          "Metroid Prime",
          "Super Mario Odyssey",
          "Halo: Combat Evolved",
          "The House in Fata Morgana - Dreams of the Revenants Edition -",
          "NFL 2K1",
          "Uncharted 2: Among Thieves",
          "Tekken 3",
          "The Legend of Zelda: The Wind Waker",
          "Gran Turismo",
          "Metal Gear Solid 2: Sons of Liberty",
          "Grand Theft Auto Double Pack",
          "Baldur's Gate II: Shadows of Amn",
          "Tetris Effect: Connected",
          "The Legend of Zelda Collector's Edition",
          "Gran Turismo 3: A-Spec",
          "The Legend of Zelda: A Link to the Past",
          "The Legend of Zelda: Majora's Mask",
          "The Last of Us",
          "Persona 5 Royal",
          "The Last of Us Remastered",
          "The Legend of Zelda: Ocarina of Time 3D",
          "Chrono Cross",
          "Gears of War",
          "Sid Meier's Civilization II",
          "Halo 3",
          "Ninja Gaiden Black",
          "Super Mario Advance 4: Super Mario Bros. 3",
          "Jet Grind Radio",
          "Grim Fandango"
        ],
        "parse_status_code": 12000
      },
      "created_at": "2025-08-04 11:48:51",
      "updated_at": "2025-08-04 11:48:51",
      "page": 1,
      "url": "https://sandbox.oxylabs.io/products",
      "job_id": "7358101611849202690",
      "is_render_forced": false,
      "status_code": 200,
      "parser_type": "preset",
      "parser_preset": "my_new_parser",
      "_request": {
        "cookies": [],
        "headers": {
          "Te": "trailers",
          "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0",
          "Sec-Fetch-Dest": "document",
          "Sec-Fetch-Mode": "navigate",
          "Sec-Fetch-Site": "none",
          "Sec-Fetch-User": "?1",
          "Accept-Encoding": "gzip, deflate, br",
          "Accept-Language": "en-US,en;q=0.5",
          "Upgrade-Insecure-Requests": "1"
        }
      },
      "_response": {
        "cookies": [],
        "headers": {
          "via": "1.1 google",
          "date": "Mon, 04 Aug 2025 11:48:51 GMT",
          "vary": "Accept-Encoding",
          "cf-ray": "969dd3fccff1c9d4-OTP",
          "server": "cloudflare",
          "alt-svc": "h3=\":443\"; ma=86400",
          "content-type": "text/html; charset=utf-8",
          "x-powered-by": "Next.js",
          "cache-control": "private, no-cache, no-store, max-age=0, must-revalidate",
          "server-timing": "cfOrigin;dur=407,cfEdge;dur=95",
          "cf-cache-status": "DYNAMIC",
          "content-encoding": "br"
        }
      },
      "session_info": {
        "id": null,
        "expires_at": null,
        "remaining": null
      }
    }
  ],
  "job": {
    "callback_url": null,
    "client_id": 13159,
    "context": [
      {
        "key": "force_headers",
        "value": false
      },
      {
        "key": "force_cookies",
        "value": false
      },
      {
        "key": "hc_policy",
        "value": true
      },
      {
        "key": "successful_status_codes",
        "value": []
      },
      {
        "key": "follow_redirects",
        "value": null
      },
      {
        "key": "cookies",
        "value": []
      },
      {
        "key": "headers",
        "value": []
      },
      {
        "key": "session_id",
        "value": null
      },
      {
        "key": "http_method",
        "value": "get"
      },
      {
        "key": "content",
        "value": null
      },
      {
        "key": "store_id",
        "value": null
      },
      {
        "key": "proxy_location",
        "value": null
      },
      {
        "key": "delivery_location",
        "value": null
      },
      {
        "key": "fulfillment_type",
        "value": null
      }
    ],
    "created_at": "2025-08-04 11:48:51",
    "domain": "io",
    "geo_location": null,
    "id": "7358101611849202690",
    "limit": 10,
    "locale": null,
    "pages": 1,
    "parse": true,
    "parser_type": "preset",
    "parser_preset": "my_new_parser",
    "parsing_instructions": null,
    "browser_instructions": null,
    "render": null,
    "xhr": false,
    "markdown": false,
    "url": "https://sandbox.oxylabs.io/products",
    "query": "",
    "source": "universal",
    "start_page": 1,
    "status": "done",
    "storage_type": null,
    "storage_url": null,
    "subdomain": "sandbox",
    "content_encoding": "utf-8",
    "updated_at": "2025-08-04 11:48:51",
    "user_agent_type": "desktop",
    "session_info": null,
    "statuses": [],
    "client_notes": null,
    "_links": [
      {
        "rel": "self",
        "href": "http://data.oxylabs.io/v1/queries/7358101611849202690",
        "method": "GET"
      },
      {
        "rel": "results",
        "href": "http://data.oxylabs.io/v1/queries/7358101611849202690/results",
        "method": "GET"
      },
      {
        "rel": "results-content",
        "href_list": [
          "http://data.oxylabs.io/v1/queries/7358101611849202690/results/1/content"
        ],
        "method": "GET"
      },
      {
        "rel": "results-html",
        "href": "http://data.oxylabs.io/v1/queries/7358101611849202690/results?type=raw",
        "method": "GET"
      },
      {
        "rel": "results-content-html",
        "href_list": [
          "http://data.oxylabs.io/v1/queries/7358101611849202690/results/1/content?type=raw"
        ],
        "method": "GET"
      },
      {
        "rel": "results-parsed",
        "href": "http://data.oxylabs.io/v1/queries/7358101611849202690/results?type=parsed",
        "method": "GET"
      },
      {
        "rel": "results-content-parsed",
        "href_list": [
          "http://data.oxylabs.io/v1/queries/7358101611849202690/results/1/content?type=parsed"
        ],
        "method": "GET"
      }
    ]
  }
}

Retrieve a preset

Endpoint: GET https://data.oxylabs.io/v1/parsers/{preset_name}

Output
{
    "created_at": "2025-08-04 11:42:43",
    "description": "Extract text from all H4 elements on the page.",
    "id": 421386,
    "name": "my_new_parser",
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "updated_at": "2025-08-04 11:42:43",
    "urls": []
}

Update a preset

Endpoint: PUT https://data.oxylabs.io/v1/parsers/{preset_name}

Define the fields of the preset you want to update. In the following example, only the parsing_instructions will get updated.

Payload:

{
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": ["//h4/text()"],
                    "_fn": "xpath"
                }
            ]
        },
        "prices": {
            "_fns": [
                {
                    "_args": [".price-wrapper"],
                    "_fn": "css"
                },
                {"_fn": "element_text"}
            ]
        }
    }
}
Output
{
    "created_at": "2025-08-04 11:42:43",
    "description": "Extract text from all H4 elements on the page.",
    "id": 421386,
    "name": "my_new_parser",
    "parsing_instructions": {
        "prices": {
            "_fns": [
                {
                    "_args": [
                        ".price-wrapper"
                    ],
                    "_fn": "css"
                },
                {
                    "_fn": "element_text"
                }
            ]
        },
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "updated_at": "2025-08-04 11:45:04",
    "urls": []
}

Delete a preset

Endpoint: DELETE https://data.oxylabs.io/v1/parsers/{preset_name}

List all presets

Endpoint: GET https://data.oxylabs.io/v1/parsers

Output
[
    {
        "created_at": "2025-08-04 11:42:43",
        "description": "Extract text from all H4 elements on the page.",
        "id": 421386,
        "name": "my_new_parser",
        "parsing_instructions": {
            "prices": {
                "_fns": [
                    {
                        "_args": [
                            ".price-wrapper"
                        ],
                        "_fn": "css"
                    },
                    {
                        "_fn": "element_text"
                    }
                ]
            },
            "titles": {
                "_fns": [
                    {
                        "_args": [
                            "//h4/text()"
                        ],
                        "_fn": "xpath"
                    }
                ]
            }
        },
        "updated_at": "2025-08-04 11:45:04",
        "urls": []
    },
    {
        "created_at": "2025-08-04 09:58:58",
        "description": "Parses all book titles on the page.",
        "id": 421379,
        "name": "books_parser",
        "parsing_instructions": {
            "titles": {
                "_fns": [
                    {
                        "_args": [
                            "//h3//text()"
                        ],
                        "_fn": "xpath"
                    }
                ]
            }
        },
        "updated_at": "2025-08-04 09:58:58",
        "urls": []
    }
]

View statistics

Endpoint: GET https://data.oxylabs.io/v1/parsers/{preset_name}/stats

Output
{
    "success_rate": 100,
    "success_rate_by_path": {
        "prices": 100,
        "titles": 100
    },
    "successful_results": 7,
    "total_results": 7
}

You can filter results by date and time using the date_from and/or date_to URL parameters. Use the format YYYY-MM-DDTHH, where T indicates the time, and HH is the hour in a 24-hour format.

For example, to get statistics from 9 AM to 2 PM on August 5, 2025:

https://data.oxylabs.io/v1/parsers/{preset_name}/stats?date_from=2025-08-05T9&date_to=2025-08-05T14

Last updated

Was this helpful?