# 解析器预设

您可以 **保存**, **复用**，以及 **修改** 通过网页爬虫API使用自定义解析说明。创建解析器预设后，我们会将其托管在我们的系统中，使您能够在抓取任务中通过 `parser_preset` 有效载荷中的参数来引用它。

此功能提供了若干 **关键能力**:

* 在我们的系统中保存和管理您自己的解析器
* 轻松在多个抓取任务中复用预设
* 创建、检索、更新、删除以及列出所有预设
* 访问预设的性能和使用统计信息
* 使用自愈预设适应不断变化的网站

## API 参考

**端点：** `https://data.oxylabs.io/v1/parsers/presets`

该表列出了每个可用操作及其端点路径：

<table><thead><tr><th width="247.30859375">操作</th><th width="152.23828125">请求方法</th><th>路径</th></tr></thead><tbody><tr><td><strong>创建</strong> 一个预设</td><td><code>POST</code></td><td><code>/v1/parsers/presets</code></td></tr><tr><td><strong>检索</strong> 一个预设</td><td><code>GET</code></td><td><code>/v1/parsers/presets/{preset_name}</code></td></tr><tr><td><strong>更新</strong> 一个预设</td><td><code>PUT</code></td><td><code>/v1/parsers/presets/{preset_name}</code></td></tr><tr><td><strong>删除</strong> 一个预设</td><td><code>DELETE</code></td><td><code>/v1/parsers/presets/{preset_name}</code></td></tr><tr><td><strong>列出所有</strong> 预设</td><td><code>GET</code></td><td><code>/v1/parsers/presets</code></td></tr><tr><td><strong>查看使用情况</strong> 和 <strong>性能</strong> 统计信息</td><td><code>GET</code></td><td><code>/v1/parsers/presets/{preset_name}/stats</code></td></tr><tr><td><strong>跟踪自愈</strong> 变更</td><td><code>GET</code></td><td><code>/v1/parsers/presets/{parser_name}/changelog</code></td></tr></tbody></table>

## 启用自愈

解析器预设配备了自愈功能，这有助于在网站发生变化时维护解析器及其成功率。启用后，解析器预设会 **自动修复自身** 并在后台调整解析说明，无需额外的手动输入。

要 **启用自愈** 对于您的自定义解析器预设，在创建或更新预设时请包含以下必填参数：

<table><thead><tr><th width="222.90234375">参数</th><th>描述</th></tr></thead><tbody><tr><td><code>self_heal</code></td><td>当设置为时，启用自愈功能 <code>True</code>.</td></tr><tr><td><code>prompt_schema</code></td><td>描述所需解析器输出的 JSON schema。您可以在 <a href="/pages/b093a36bab08203ddafe804951101a0956b1f771">通过 API 生成解析器</a>.</td></tr><tr><td><code>urls</code></td><td>同一页面类型的最多 5 个 URL 列表。我们建议提供 3-5 个 URL，以帮助解析器适应不同布局并提高解析准确性。</td></tr></tbody></table>

<details>

<summary>有效载荷示例</summary>

此处显示的有效载荷示例通过更新现有预设来启用自愈。

**端点：** `PUT https://data.oxylabs.io/v1/parsers/presets/{preset_name}`

```json
{
    "self_heal": true,
    "urls": ["https://sandbox.oxylabs.io/products"],
    "prompt_schema": {
        "properties": {
            "product_titles": {
                "description": "每个产品的标题。",
                "items": {
                    "type": "string"
                },
                "maxItems": 5,
                "title": "产品标题",
                "type": "array"
            }
        },
        "required": [
            "product_titles"
        ],
        "title": "字段",
        "type": "object"
    }
}
```

</details>

## 使用示例

### 创建一个预设

**端点：** `POST https://data.oxylabs.io/v1/parsers/presets`

**有效载荷：**

```json
{
    "name": "my_new_parser",
    "description": "提取页面上所有 H4 元素中的文本。",
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": ["//h4/text()"],
                    "_fn": "xpath"
                }
            ]
        }
    }
}
```

<details>

<summary>输出</summary>

```json
{
    "id": 421947,
    "name": "my_new_parser",
    "description": "提取页面上所有 H4 元素中的文本。",
    "prompt_text": null,
    "prompt_schema": null,
    "urls": [],
    "render": false,
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "self_heal": false,
    "heal_status": "disabled",
    "last_healed_at": null,
    "created_at": "2025-10-27 11:40:22",
    "updated_at": "2025-10-27 11:40:22"
}
```

</details>

### 使用预设

**端点：** `POST https://realtime.oxylabs.io/v1/queries`

**有效载荷：**

```json
{
    "source": "universal",
    以下是一个示例：
    "parse": true,
    "parser_preset": "my_new_parser"
}
```

<details>

<summary>输出</summary>

```json
{
    "results": [
        {
            "content": {
                "titles": [
                    "塞尔达传说：时之笛",
                    "超级马力欧银河",
                    "超级马力欧银河 2",
                    "银河战士 Prime",
                    "超级马力欧：奥德赛",
                    "光环：战斗进化",
                    "法塔摩尔加纳之馆 - 梦中亡灵版 -",
                    "NFL 2K1",
                    "神秘海域 2：纵横四海",
                    "铁拳 3",
                    "塞尔达传说：风之杖",
                    "GT 赛车",
                    "合金装备 2：自由之子",
                    "侠盗猎车手 双重包",
                    "博德之门 II：安姆的阴影",
                    "俄罗斯方块效应：连接",
                    "塞尔达传说 收藏版",
                    "GT 赛车 3：A-Spec",
                    "塞尔达传说：众神的三角力量",
                    "塞尔达传说：姆吉拉的假面",
                    "最后生还者",
                    "女神异闻录 5 皇家版",
                    "最后生还者 重制版",
                    "塞尔达传说：时之笛 3D",
                    "时空之轮 十字",
                    "战争机器",
                    "席德·梅尔的文明 II",
                    "光环 3",
                    "忍者龙剑传 Black",
                    "超级马力欧 Advance 4：超级马力欧兄弟 3",
                    "Jet Grind Radio",
                    "冥界狂想曲"
                ],
                "解析状态码": 12000
            },
            "created_at": "2025-10-27 11:41:18",
            "updated_at": "2025-10-27 11:41:19",
            "page": 1,
            以下是一个示例：
            "job_id": "7388540292158203905",
            "is_render_forced": false,
            "status_code": 200,
            "类型": "parsed",
            "parser_type": "preset",
            "parser_preset": "my_new_parser"
        }
    ]
}
```

</details>

### 检索一个预设

**端点：** `GET https://data.oxylabs.io/v1/parsers/presets/{preset_name}`

<details>

<summary>输出</summary>

```json
{
    "id": 421947,
    "name": "my_new_parser",
    "description": "提取页面上所有 H4 元素中的文本。",
    "prompt_text": null,
    "prompt_schema": null,
    "urls": [],
    "render": false,
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "self_heal": false,
    "heal_status": "disabled",
    "last_healed_at": null,
    "created_at": "2025-10-27 11:40:22",
    "updated_at": "2025-10-27 11:40:22"
}
```

</details>

### 更新一个预设

**端点：** `PUT https://data.oxylabs.io/v1/parsers/presets/{preset_name}`

定义您要更新的预设字段。在以下示例中，只有 `parsing_instructions` 会被更新。

**有效载荷：**

```json
{
    "parsing_instructions": {
        "titles": {
            "_fns": [
                {
                    "_args": ["//h4/text()"],
                    "_fn": "xpath"
                }
            ]
        },
        "prices": {
            "_fns": [
                {
                    "_args": [".price-wrapper"],
                    "_fn": "css"
                },
                {"_fn": "element_text"}
            ]
        }
    }
}
```

<details>

<summary>输出</summary>

```json
{
    "id": 421947,
    "name": "my_new_parser",
    "description": "提取页面上所有 H4 元素中的文本。",
    "prompt_text": null,
    "prompt_schema": null,
    "urls": [],
    "render": false,
    "parsing_instructions": {
        "prices": {
            "_fns": [
                {
                    "_args": [
                        ".price-wrapper"
                    ],
                    "_fn": "css"
                },
                {
                    "_fn": "element_text"
                }
            ]
        },
        "titles": {
            "_fns": [
                {
                    "_args": [
                        "//h4/text()"
                    ],
                    "_fn": "xpath"
                }
            ]
        }
    },
    "self_heal": false,
    "heal_status": "disabled",
    "last_healed_at": null,
    "created_at": "2025-10-27 11:40:22",
    "updated_at": "2025-10-27 11:44:24"
}
```

</details>

### 删除一个预设

**端点：** `DELETE https://data.oxylabs.io/v1/parsers/presets/{preset_name}`

### 列出所有预设

**端点：** `GET https://data.oxylabs.io/v1/parsers/presets`

<details>

<summary>输出</summary>

```json
[
    {
        "id": 421950,
        "name": "books_parser",
        "description": "解析页面上的所有书名。",
        "prompt_text": null,
        "prompt_schema": null,
        "urls": [],
        "render": false,
        "parsing_instructions": {
            "titles": {
                "_fns": [
                    {
                        "_args": [
                            "//h3//text()"
                        ],
                        "_fn": "xpath"
                    }
                ]
            }
        },
        "self_heal": false,
        "heal_status": "disabled",
        "last_healed_at": null,
        "created_at": "2025-10-27 11:46:59",
        "updated_at": "2025-10-27 11:46:59"
    },
    {
        "id": 421947,
        "name": "my_new_parser",
        "description": "提取页面上所有 H4 元素中的文本。",
        "prompt_text": null,
        "prompt_schema": null,
        "urls": [],
        "render": false,
        "parsing_instructions": {
            "titles": {
                "_fns": [
                    {
                        "_args": [
                            "//h4/text()"
                        ],
                        "_fn": "xpath"
                    }
                ]
            }
        },
        "self_heal": false,
        "heal_status": "disabled",
        "last_healed_at": null,
        "created_at": "2025-10-27 11:40:22",
        "updated_at": "2025-10-27 11:45:20"
    }
]
```

</details>

### 查看统计信息

**端点：** `GET https://data.oxylabs.io/v1/parsers/presets/{preset_name}/stats`

<details>

<summary>输出</summary>

```json
{
    "total_results": 9,
    "successful_results": 9,
    "success_rate": 100,
    "success_rate_by_path": {
        "titles": 100
    }
}
```

</details>

您可以 **按日期和时间筛选结果** 使用 `date_from` 和/或 `date_to` URL 参数。使用格式 `YYYY-MM-DDTHH`，其中 `T` 表示时间，而 `HH` 表示 24 小时制中的小时。

例如，要获取 2025 年 8 月 5 日上午 9 点到下午 2 点的统计信息：

```url
https://data.oxylabs.io/v1/parsers/presets/{preset_name}/stats?date_from=2025-08-05T9&date_to=2025-08-05T14
```

### 跟踪自愈变更

**端点：** `GET https://data.oxylabs.io/v1/parsers/presets/{preset_name}/changelog`

我们的系统会自动记录自愈活动。您可以访问此历史日志，以查看自愈功能所做的全部修改。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/products/cn/web-scraper-api/features/custom-parser/parser-presets.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
