> For the complete documentation index, see [llms.txt](https://developers.oxylabs.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://developers.oxylabs.io/products/cn/web-scraper-api/features/custom-parser/parser-presets.md).

# 解析器预设

您可以 **保存**, **复用**、 **修改** 通过网页爬虫API使用自定义解析说明。创建解析器预设后，我们会将其托管在我们的系统中，使您能够在抓取任务中通过 `parser_preset` 负载中的参数来引用它。

此功能提供若干 **关键能力**:

* 在我们的系统中保存和管理您自己的解析器
* 轻松在多个抓取任务中复用预设
* 创建、检索、更新、删除以及列出所有预设
* 访问预设的性能和使用统计
* 使用自愈预设适应不断变化的网站

## API 参考

**端点：** `https://data.oxylabs.io/v1/parsers/presets`

下表列出了每个可用操作及其端点路径：

<table><thead><tr><th width="247.30859375">操作</th><th width="152.23828125">请求方法</th><th>路径</th></tr></thead><tbody><tr><td><strong>创建</strong> 一个预设</td><td><code>POST</code></td><td><code>/v1/parsers/presets</code></td></tr><tr><td><strong>检索</strong> 一个预设</td><td><code>GET</code></td><td><code>/v1/parsers/presets/{preset_name}</code></td></tr><tr><td><strong>更新</strong> 一个预设</td><td><code>PUT</code></td><td><code>/v1/parsers/presets/{preset_name}</code></td></tr><tr><td><strong>删除</strong> 一个预设</td><td><code>DELETE</code></td><td><code>/v1/parsers/presets/{preset_name}</code></td></tr><tr><td><strong>列出所有</strong> 预设</td><td><code>GET</code></td><td><code>/v1/parsers/presets</code></td></tr><tr><td><strong>查看使用情况</strong> 和 <strong>性能</strong> 统计</td><td><code>GET</code></td><td><code>/v1/parsers/presets/{preset_name}/stats</code></td></tr><tr><td><strong>跟踪自愈</strong> 变更</td><td><code>GET</code></td><td><code>/v1/parsers/presets/{parser_name}/changelog</code></td></tr></tbody></table>

## 启用自愈

解析器预设配备了自愈功能，这有助于在网站发生变化时维护解析器及其成功率。启用后，解析器预设 **会自动自行修复** 并在后台调整解析说明，无需额外手动输入。

要 **要为您的自定义解析器预设** 启用自愈，在创建或更新预设时请包含以下必填参数：

<table><thead><tr><th width="222.90234375">参数</th><th>说明</th></tr></thead><tbody><tr><td><code>self_heal</code></td><td>设置为时将开启自愈功能 <code>True</code>.</td></tr><tr><td><code>prompt_schema</code></td><td>描述所需解析器输出的 JSON schema。您可以在 <a href="/pages/b093a36bab08203ddafe804951101a0956b1f771">使用 API 生成解析器时自动创建该 schema</a>.</td></tr><tr><td><code>urls</code></td><td>最多包含 5 个相同页面类型的 URL 列表。我们建议提供 3-5 个 URL，以帮助解析器适应不同布局并提高解析准确性。</td></tr></tbody></table>

<details>

<summary>负载示例</summary>

此处显示的负载示例通过更新现有预设来启用自愈。

**端点：** `PUT https://data.oxylabs.io/v1/parsers/presets/{preset_name}`

```json
{
    "self_heal": true,
    "urls": ["https://sandbox.oxylabs.io/products"],
    "prompt_schema": {
        "properties": {
            "product_titles": {
                "description": "Title of each product.",
                "items": {
                    "type": "string"
                },
                "maxItems": 5,
                "title": "Product Titles",
                "type": "array"
            }
        },
        "required": [
            "product_titles"
        ],
        "title": "字段",
        "type": "object"
    }
}
```

</details>

## 使用示例

### 创建一个预设

**端点：** `POST https://data.oxylabs.io/v1/parsers/presets`

**负载：**

```json
{
    "name": "my_new_parser",
    "description": "Extract text from all H4 elements on the page.",
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": ["//h4/text()"],
                    "_fn": "xpath"
                }
            ]
        }
    }
}
```

<details>

<summary>输出</summary>

```json
{
    "id": 421947,
    "name": "my_new_parser",
    "description": "Extract text from all H4 elements on the page.",
    "prompt_text": null,
    "prompt_schema": null,
    "urls": [],
    "render": false,
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "self_heal": false,
    "heal_status": "disabled",
    "last_healed_at": null,
    "created_at": "2025-10-27 11:40:22",
    "updated_at": "2025-10-27 11:40:22"
}
```

</details>

### 使用一个预设

**端点：** `POST https://realtime.oxylabs.io/v1/queries`

**负载：**

```json
{
    "source": "universal",
    示例如下：
    "parse": true,
    "parser_preset": "my_new_parser"
}
```

<details>

<summary>输出</summary>

```json
{
    "results": [
        {
            "content": {
                "titles": [
                    "The Legend of Zelda: Ocarina of Time",
                    "Super Mario Galaxy",
                    "Super Mario Galaxy 2",
                    "Metroid Prime",
                    "Super Mario Odyssey",
                    "Halo: Combat Evolved",
                    "The House in Fata Morgana - Dreams of the Revenants Edition -",
                    "NFL 2K1",
                    "Uncharted 2: Among Thieves",
                    "Tekken 3",
                    "The Legend of Zelda: The Wind Waker",
                    "Gran Turismo",
                    "Metal Gear Solid 2: Sons of Liberty",
                    "Grand Theft Auto Double Pack",
                    "Baldur's Gate II: Shadows of Amn",
                    "Tetris Effect: Connected",
                    "The Legend of Zelda Collector's Edition",
                    "Gran Turismo 3: A-Spec",
                    "The Legend of Zelda: A Link to the Past",
                    "The Legend of Zelda: Majora's Mask",
                    "The Last of Us",
                    "Persona 5 Royal",
                    "The Last of Us Remastered",
                    "The Legend of Zelda: Ocarina of Time 3D",
                    "Chrono Cross",
                    "Gears of War",
                    "Sid Meier's Civilization II",
                    "Halo 3",
                    "Ninja Gaiden Black",
                    "Super Mario Advance 4: Super Mario Bros. 3",
                    "Jet Grind Radio",
                    "Grim Fandango"
                ],
                "parse_status_code": 12000
            },
            "created_at": "2025-10-27 11:41:18",
            "updated_at": "2025-10-27 11:41:19",
            "page": 1,
            示例如下：
            "job_id": "7388540292158203905",
            "is_render_forced": false,
            "status_code": 200,
            "type": "parsed",
            "parser_type": "preset",
            "parser_preset": "my_new_parser"
        }
    ]
}
```

</details>

### 检索一个预设

**端点：** `GET https://data.oxylabs.io/v1/parsers/presets/{preset_name}`

<details>

<summary>输出</summary>

```json
{
    "id": 421947,
    "name": "my_new_parser",
    "description": "Extract text from all H4 elements on the page.",
    "prompt_text": null,
    "prompt_schema": null,
    "urls": [],
    "render": false,
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "self_heal": false,
    "heal_status": "disabled",
    "last_healed_at": null,
    "created_at": "2025-10-27 11:40:22",
    "updated_at": "2025-10-27 11:40:22"
}
```

</details>

### 更新一个预设

**端点：** `PUT https://data.oxylabs.io/v1/parsers/presets/{preset_name}`

定义您想要更新的预设字段。在以下示例中，只有 `parsing_instructions` 会被更新。

**负载：**

```json
{
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": ["//h4/text()"],
                    "_fn": "xpath"
                }
            ]
        },
        "prices": {
            "_fns": [
                {
                    "_args": [".price-wrapper"],
                    "_fn": "css"
                },
                {"_fn": "element_text"}
            ]
        }
    }
}
```

<details>

<summary>输出</summary>

```json
{
    "id": 421947,
    "name": "my_new_parser",
    "description": "Extract text from all H4 elements on the page.",
    "prompt_text": null,
    "prompt_schema": null,
    "urls": [],
    "render": false,
    "parsing_instructions": {
        "prices": {
            "_fns": [
                {
                    "_args": [
                        ".price-wrapper"
                    ],
                    "_fn": "css"
                },
                {
                    "_fn": "element_text"
                }
            ]
        },
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "self_heal": false,
    "heal_status": "disabled",
    "last_healed_at": null,
    "created_at": "2025-10-27 11:40:22",
    "updated_at": "2025-10-27 11:44:24"
}
```

</details>

### 删除一个预设

**端点：** `DELETE https://data.oxylabs.io/v1/parsers/presets/{preset_name}`

### 列出所有预设

**端点：** `GET https://data.oxylabs.io/v1/parsers/presets`

<details>

<summary>输出</summary>

```json
[
    {
        "id": 421950,
        "name": "books_parser",
        "description": "Parses all book titles on the page.",
        "prompt_text": null,
        "prompt_schema": null,
        "urls": [],
        "render": false,
        "parsing_instructions": {
            "titles": {
                "_fns": [
                    {
                        "_args": [
                            "//h3//text()"
                        ],
                        "_fn": "xpath"
                    }
                ]
            }
        },
        "self_heal": false,
        "heal_status": "disabled",
        "last_healed_at": null,
        "created_at": "2025-10-27 11:46:59",
        "updated_at": "2025-10-27 11:46:59"
    },
    {
        "id": 421947,
        "name": "my_new_parser",
        "description": "Extract text from all H4 elements on the page.",
        "prompt_text": null,
        "prompt_schema": null,
        "urls": [],
        "render": false,
        "parsing_instructions": {
            "titles": {
                "_fns": [
                    {
                        "_args": [
                            "//h4/text()"
                        ],
                        "_fn": "xpath"
                    }
                ]
            }
        },
        "self_heal": false,
        "heal_status": "disabled",
        "last_healed_at": null,
        "created_at": "2025-10-27 11:40:22",
        "updated_at": "2025-10-27 11:45:20"
    }
]
```

</details>

### 查看统计

**端点：** `GET https://data.oxylabs.io/v1/parsers/presets/{preset_name}/stats`

<details>

<summary>输出</summary>

```json
{
    "total_results": 9,
    "successful_results": 9,
    "success_rate": 100,
    "success_rate_by_path": {
        "titles": 100
    }
}
```

</details>

您可以 **按日期和时间筛选结果** 使用 `date_from` 和/或 `date_to` URL 参数。请使用以下格式 `YYYY-MM-DDTHH`，其中 `T` 表示时间，而 `HH` 表示 24 小时制中的小时。

例如，要获取 2025 年 8 月 5 日上午 9 点到下午 2 点的统计数据：

```url
https://data.oxylabs.io/v1/parsers/presets/{preset_name}/stats?date_from=2025-08-05T9&date_to=2025-08-05T14
```

### 跟踪自愈变更

**端点：** `GET https://data.oxylabs.io/v1/parsers/presets/{preset_name}/changelog`

我们的系统会自动记录自愈活动。您可以访问此历史日志，以查看自愈功能所做的所有修改。


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/products/cn/web-scraper-api/features/custom-parser/parser-presets.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
