入门

学习使用 Oxylabs 自定义解析器。在本页你会找到完整的示例、提示，以及解析失败时的处理细节。

如何使用自定义解析器

场景示例

假设你想解析 总结果数量 Bing Search 在搜索词时返回的结果数 test:

我们将概述实现此目标的三种主要方法：

使用 OxyCopilot 生成解析器

OxyCopilot 允许你用简单的英文描述需求以 自动为网站创建爬虫和解析器 了解基础内容，请按照下面的步骤操作，并查看 OxyCopilot 文档以获取更多信息。

在仪表板上打开 Web Scraper API Playground 以访问 OxyCopilot。

输入 URL（们）

点击 OxyCopilot 按钮 在左上角并输入最多 3 个相同页面类型的 URL。我们使用这个 Bing Search URL： https://www.bing.com/search?q=test.

你也可以通过填写顶部的网站, 爬虫，和 URL 字段并在左侧菜单中调整 其他参数 例如 JavaScript 渲染，来手动配置爬虫。

设置爬虫参数

接下来，指定爬虫参数、浏览器指令，并在目标网站需要时启用 JavaScript 渲染。

对于 Bing Search， 启用 JavaScript 渲染 然后点击 下一步.

编写提示（prompt）

说明你想从页面中提取的数据。确保描述清晰并提供最重要的信息。你可以在我们的 OxyCopilot 提示库.

中找到流行网站的提示示例。将以下提示粘贴以从 Bing Search 页面中提取总结果数：

解析总搜索结果的数量。

点击 生成指令 按钮以发送你的提示。

查看解析的数据和指令

OxyCopilot 完成后，你会看到如下窗口，解析的数据在右侧：

如果你想进行任何调整，可以在这里完成。修改 URL（们）、完善提示、启用 Javascript 渲染，或编辑解析模式以满足你的需求。当你在此窗口中更新任意字段时，可以通过选择 开始新请求.

你也可以 在此查看并直接编辑解析指令 ：

当你对结果满意后， 加载指令 以继续。

将解析器保存为预设

你可以轻松将解析指令保存为一个解析器预设。这让你可以在 OxyCopilot 和 API 请求中重复使用该预设。

在 Web Scraper API Playground 中，你可以选择将预设保存为哪个用户。设置完成后，只需点击保存:

会弹出一个窗口，提示你为预设命名并添加可选描述：

在 API 请求中使用预设

要在 Web Scraper API 请求中使用预设，请设置 parse to true 并使用 parser_preset 参数一起使用时有用。

指定预设名称。端点： POST https://data.oxylabs.io/v1/queries

{
    "source": "bing_search",
    "query": "test",
    "render": "html",
    "parse": true,
    "parser_preset": "Bing_total_results"
}

运行请求将返回以下 JSON 输出：

{
    "results": [
        {
            "content": {
                "parse_status_code": 12000,
                "total_search_results": 12000000
            },
            "created_at": "2025-10-24 09:29:28",
            "updated_at": "2025-10-24 09:30:42",
            "page": 1,
            "url": "https://www.bing.com/search?q=test",
            "job_id": "7387419953164488705",
            "is_render_forced": false,
            "status_code": 200,
            "type": "parsed",
            "parser_type": "preset",
            "parser_preset": "Bing_total_results"
        }
    ]
}

高级用法

通过 API 生成解析器

你可以不在 Playground 中使用 OxyCopilot，而是直接向 Web Scraper API 发送提示并生成解析器。参见通过 API 生成解析指令文档页以了解更多。

我们建议 提供 3-5 个相同类型的 URL （例如，产品页面）。这有助于解析器适应不同布局并提高解析准确性。

指定预设名称。端点： POST https://data.oxylabs.io/v1/parsers/generate-instructions/prompt

{
    "prompt_text": "Parse the number of total search results.",
    "urls": ["https://www.bing.com/search?q=test"],
    "render": true
}

输出

{
    "parsing_instructions": {
        "total_search_results": {
            "_fns": [
                {
                    "_args": [
                        "//span[contains(@class, 'count')]/text()"
                    ],
                    "_fn": "xpath_one"
                },
                {
                    "_fn": "amount_from_string"
                }
            ]
        }
    },
    "prompt_schema": {
        "properties": {
            "total_search_results": {
                "description": "The number of total search results.",
                "title": "Total Search Results",
                "type": "number"
            }
        },
        "required": [
            "total_search_results"
        ],
        "title": "字段",
        "type": "object"
    }
}

通过 API 保存解析器预设

Web Scraper API 允许你将解析指令保存为可复用的解析器预设。查看解析器预设文档以查找可用操作列表和完整的代码示例。

指定预设名称。端点： POST https://data.oxylabs.io/v1/parsers/presets

{
    "name": "Bing_total_results",
    "parsing_instructions": {
        "total_search_results": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": [
                        "//span[contains(@class, 'count')]/text()"
                    ]
                },
                {
                    "_fn": "amount_from_string"
                }
            ]
        }
    }
}

输出

{
    "id": 421938,
    "name": "Bing_total_results",
    "description": null,
    "prompt_text": null,
    "prompt_schema": null,
    "urls": [],
    "render": false,
    "parsing_instructions": {
        "total_search_results": {
            "_fns": [
                {
                    "_args": [
                        "//span[contains(@class, 'count')]/text()"
                    ],
                    "_fn": "xpath_one"
                },
                {
                    "_fn": "amount_from_string"
                }
            ]
        }
    },
    "self_heal": false,
    "heal_status": "disabled",
    "last_healed_at": null,
    "created_at": "2025-10-27 09:28:37",
    "updated_at": "2025-10-27 09:28:37"
}

手动编写指令

要手动使用自定义解析器，在创建作业时包含一组 parsing_instructions 。你可以使用 CSS 和 XPath 选择器 以定位 DOM 中的元素。

按照下面的逐步示例学习基础知识，然后查看我们关于手动编写指令的深入指南，以获取高级技巧和详细文档。

以 Bing Search 场景为例，作业参数如下所示：

{
    "source": "bing_search",
    "query": "test",
    "render": "html",
    "parse": true,
    "parsing_instructions": {
        "number_of_results": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": [".//span[@class='sb_count']/text()"]
                }
            ]
        }
    }
}

步骤 1。 您必须提供 "parse": true 参数一起使用时有用。

步骤 2。 解析指令必须在 "parsing_instructions" 字段中描述。

中描述。上面的示例解析指令指定目标是从抓取的文档中解析搜索结果数量并将结果放入 number_of_results 字段。通过定义“管道”来说明如何解析该字段，表示为：

"_fns": [
    {
        "_fn": "xpath_one",
        "_args": [".//span[@class='sb_count']/text()"]
    }
]

该管道描述要执行的数据处理函数列表。函数按列表中出现的顺序执行，并将前一个函数的输出作为输入。

在上面的示例管道中，使用了 xpath_one 函数（可用函数完整列表）。它允许你使用 XPath 表达式和 XSLT 函数处理 HTML 文档。作为函数参数，指定可以找到目标元素的精确路径： .//span[@class='sb_count']。你也可以指示解析器选择目标元素中找到的 text() 。

上面示例作业的解析结果应如下所示：

{
    "results": [
        {
            "content": {
                "number_of_results": "About 16,700,000 results",
                "parse_status_code": 12000
            },
            "created_at": "2025-10-27 09:48:04",
            "updated_at": "2025-10-27 09:48:38",
            "page": 1,
            "url": "https://www.bing.com/search?q=test",
            "job_id": "7388511797231226881",
            "is_render_forced": false,
            "status_code": 200,
            "type": "parsed",
            "parser_type": "custom",
            "parser_preset": null
        }
    ]
}

自定义解析器不仅提供从抓取的 HTML 中提取文本，还可以执行基本的数据处理函数。

例如，前面描述的解析指令将 number_of_results 提取为包含你可能不需要的额外关键字的文本。如果你想以数值数据类型获取给定 query=test 的结果数量，你可以重用相同的解析指令并向现有管道添加 amount_from_string 函数：

{
    "source": "bing_search",
    "query": "test",
    "render": "html",
    "parse": true,
    "parsing_instructions": {
        "number_of_results": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": [".//span[@class='sb_count']/text()"]
                },
                {
                    "_fn": "amount_from_string"
                }
            ]
        }
    }
}

上面示例作业的解析结果应如下所示：

{
    "results": [
        {
            "content": {
                "number_of_results": 14200000,
                "parse_status_code": 12000
            },
            "created_at": "2025-10-27 10:00:36",
            "updated_at": "2025-10-27 10:01:05",
            "page": 1,
            "url": "https://www.bing.com/search?q=test",
            "job_id": "7388514950961963009",
            "is_render_forced": false,
            "status_code": 200,
            "type": "parsed",
            "parser_type": "custom",
            "parser_preset": null
        }
    ]
}

当使用自定义解析器时，如果解析失败会发生什么

如果自定义解析器无法处理客户端定义的解析指令，我们将返回 12005 状态代码（解析带有警告）。

{
    "source": "bing_search",
    "query": "test",
    "render": "html",
    "parse": true,
    "parsing_instructions": {
        "number_of_results": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": [".//span[@class='sb_count']/text()"]
                },
                {
                    "_fn": "amount_from_string"
                }
            ]
        },
        "number_of_organics": {
            "_fns": [
                {
                    "_fn": "xpath",
                    "_args": ["//this-will-not-match-anything"]
                },
                {
                    "_fn": "length"
                }
            ]
        }
    }
}

此类结果会产生费用：

{
    "results": [
        {
            "content": {
                convert_to_float
                    {
                        "_fn": "xpath",
                        "_msg": "XPath 表达式未匹配到任何数据。",
                        "_path": ".number_of_organics",
                        "_fn_idx": 0
                    }
                ],
                "number_of_results": 14200000,
                "parse_status_code": 12005,
                "number_of_organics": null
            },
            "created_at": "2025-10-27 10:03:54",
            "updated_at": "2025-10-27 10:04:22",
            "page": 1,
            "url": "https://www.bing.com/search?q=test",
            "job_id": "7388515782126234625",
            "is_render_forced": false,
            "status_code": 200,
            "type": "parsed",
            "parser_type": "custom",
            "parser_preset": null
        }
    ]
}

如果自定义解析器在解析操作中遇到异常并中断，可能会返回这些状态代码： 12002, 12006, 12007。对于这些意外错误，你不会被收费。

状态代码

请参见我们列出的状态代码 here.

上一页自定义解析器下一页通过 API 生成解析指令

最后更新于23天前

这有帮助吗？

晚上好

hashtag如何使用自定义解析器

hashtag场景示例

hashtag使用 OxyCopilot 生成解析器

hashtag输入 URL（们）

hashtag设置爬虫参数

hashtag编写提示（prompt）

hashtag查看解析的数据和指令

hashtag将解析器保存为预设

hashtag在 API 请求中使用预设

hashtag高级用法

hashtag通过 API 生成解析器

hashtag通过 API 保存解析器预设

hashtag手动编写指令

hashtag当使用自定义解析器时，如果解析失败会发生什么

hashtag状态代码

如何使用自定义解析器

场景示例

使用 OxyCopilot 生成解析器

输入 URL（们）

设置爬虫参数

编写提示（prompt）

查看解析的数据和指令

将解析器保存为预设

在 API 请求中使用预设

高级用法

通过 API 生成解析器

通过 API 保存解析器预设

手动编写指令

当使用自定义解析器时，如果解析失败会发生什么

状态代码