> For the complete documentation index, see [llms.txt](https://developers.oxylabs.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://developers.oxylabs.io/products/cn/web-scraper-api/web-scraper-api-playground/oxycopilot.md). # OxyCopilot **OxyCopilot** 是一个免费的 [**网页爬虫API** ](/products/cn/web-scraper-api.md)功能，可让入门更轻松，并帮助用户为复杂用例找到有效解决方案，且无需具备编码知识。OxyCopilot 目前包含三个独立功能： * **爬虫构建器** * [**自定义解析器**](/products/cn/web-scraper-api/features/custom-parser.md) **构建器** * **浏览器指令构建器** {% hint style="success" %} OxyCopilot 可在 [**网页** **网页爬虫API Playground**](https://dashboard.oxylabs.io/?route=/api-playground) 在 Oxylabs 仪表板中。 {% endhint %} {% embed url="" %} ## 爬虫构建器 OxyCopilot 可帮助你配置爬虫（并构建请求负载），用于网页爬虫API，而无需理解文档或字段逻辑。 ### 工作原理 #### **步骤 1：提供 URL 和提示** * **URL：** 提供你想要抓取的 URL。 * **提示：** 描述你的需求（例如，本地化、JS 渲染等）。

**步骤 2：解析** 你有三种处理解析的选项： 1. **自定义解析器**：选择“添加解析说明”，使用 [**自定义解析器构建器**](#custom-parser-builder). 2. **专用解析器**：如果该 URL 来自我们提供专用解析器的网站，并且你想使用它，请选择“继续使用专用解析器”。 3. **不解析**：如果不需要结构化数据，请选择不进行解析。 {% hint style="warning" %} 如果该 URL 属于我们有专用解析器的网站，但你不需要结构化数据，请选择“继续使用专用解析器”，并在 playground 的设置中禁用 parse 参数。避免使用退出按钮，因为它不会保存预填充的参数。 {% endhint %}

#### **步骤 3：检查请求** 根据你的提示，OxyCopilot 会在网页爬虫API Playground 中预填必要参数。你将看到适用于你的用例的具体请求代码和参数，如有需要可以调整参数。 **步骤 4：提交请求并复制** 如果一切正常，提交请求查看输出效果，并检查其是否按预期工作。然后，复制请求代码，以便在后续抓取任务中使用网页爬虫API。 ### 示例 #### URL ``` https://www.amazon.de/s?k=adidas ``` #### 提示 {% code overflow="wrap" %} ``` 从提供的 URL 抓取 Amazon 搜索页面，并将结果本地化到波兰。 ``` {% endcode %} #### AI 生成的参数（JSON） ```json { "source": "amazon_search", "query": "adidas", "geo_location": "PL", "domain": "de" } ``` #### AI 生成的请求代码

## 自定义解析器构建器利用 [**自定义解析器**](/products/cn/web-scraper-api/features/custom-parser.md) 功能与 OxyCopilot 一起构建解析器，无需编写代码或手动分析网站结构。 ### 工作原理 #### **步骤 1：提供 URL 和提示** * **URL：** 你最多可以提供 **3 个 URL** ，用于生成解析说明。OxyCopilot 会使用所提供 URL 的 HTML 来确定提取所需字段的最佳逻辑。 {% hint style="info" %} 你提供的 URL 越多，解析说明就越稳健，因为 OxyCopilot 会识别相似页面中的共同模式。请注意，额外的 URL 可能会增加结果等待时间。 {% endhint %} * **提示：** 提示是构建自然语言 schema 的关键组成部分，它是生成实际解析说明的基础。提示应清楚描述需要解析的字段。

#### **步骤 2 \[可选]：调整解析 schema** 此步骤可让你微调解析 schema，以更好地满足需求或排查问题。 #### **解析 schema 概览**

此表展示了 AI 用于生成解析说明的输入。schema 定义了需要解析哪些字段，并由多种对象类型组成（见下方 [**表**](#object-type-explanations) ）。 schema 中的每个项都必须包含： * **名称**：这将用作解析说明中的对象键，并在解析数据中可见。 * **说明** （可选但建议）：有助于提高解析准确性。 ### **schema 调整** * **重新排序项目**：使用左侧的点拖放项目以更改顺序（只有同一嵌套层级中的项目可以移动）。 * **编辑项目**：点击编辑图标可修改任意字段。 * **删除项目**：你可以删除父级中的任意项目。 * **添加新项目**：向父级添加新项目。更新 schema 后，点击 **“刷新输出”** 按钮以重新生成说明并预览解析数据。 ### 对象类型说明

对象类型	说明	解析数据示例
字符串	单个文本输出	`“title”: “示例商品标题”`
数字	单个数字	`“price”: 9.99`
字符串数组	文本输出列表	`“products”: [“商品 1”, “商品 2”, “商品 3”]`
数字数组	数字列表	`“pages”: [1, 2, 3]`
对象数组	对象/项列表，每个项内部都有自己的对象（`_items` 块在解析说明中）	`“related_items”: [ { “title”: “商品 1”, “price”: 9.99 }, { “title”: “商品 2”, “price”: 15.99 } ]`

### 使用对象数组 1. **选择“对象数组”**：此选项会添加一个子对象和按钮。

2. **填写对象名称**：要将该项保存到 schema 中，必须填写父对象和子对象的名称。完成后，勾选标记将变为绿色。

3. **子对象要求**：一个“对象数组”至少必须包含一个子对象。 ### 测试说明默认情况下，解析数据基于 **步骤 1**中提供的第一个 URL。你也可以提供不同的 URL 来测试解析说明：

{% hint style="warning" %} 说明是根据初始 URL 生成的，不会考虑测试 URL。编辑提示或 URL 将重置 schema，因此需要完全重新生成。 {% endhint %} #### **步骤 3：复制/保存说明并集成到抓取任务中** 一旦说明令人满意： * 使用 **“复制”** 按钮复制说明并将其粘贴到你的爬虫代码中。 * 或者，将说明保存到你的网页爬虫API Playground 会话中，调整其他请求参数，进行测试，然后以你偏好的编程语言复制完整的请求代码。

### 示例 #### URL ``` https://sandbox.oxylabs.io/products/1 ``` #### 提示 {% code overflow="wrap" %} ``` 我想解析一个产品页面。解析数据应包含以下字段： - product_title：包含产品标题的文本字段 - price：包含产品价格的数值字段 - related_products：包含主产品信息下方显示的相关产品标题的列表 ``` {% endcode %} #### 解析 schema

对象类型	名称*	说明
字符串	product_title	产品标题
数字	price	产品价格
字符串数组	related_products	主产品信息下方的相关产品标题

#### 解析说明 ```json { "product_title": { "_fns": [ { "_fn": "xpath_one", "_args": [ "//h2[@class=\"title css-1k75zwy e1pl6npa11\"]/text()", "//div[@class=\"product-info-wrapper css-m2w3q2 emlf3670\"]/h2/text()", "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[1]/div[2]/div[2]/h2/text()" ] }, { "_fn": "regex_search", "_args": [ "^\\s*(.[\\s\\S]*?)\\s*$", 1 ] } ] }, "price": { "_fns": [ { "_fn": "xpath_one", "_args": [ "//div[@class=\"price css-o7uf8d e1pl6npa6\"]/text()", "//div[@class=\"product-info-wrapper css-m2w3q2 emlf3670\"]/div[4]/text()", "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[1]/div[2]/div[2]/div[4]/text()" ] }, { "_fn": "amount_from_string" } ] }, "related_products": { "_fns": [ { "_fn": "xpath", "_args": [ "//div/div[@class=\"product-card css-e8at8d eag3qlw10\"]/a[1]/h4/text()", "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[2]/div/a[1]/h4/text()", "//div[@class=\"related-products css-1rinft1 emlf3670\"]/div/a[1]/h4/text()", "//html[@lang=\"en\"]/body/div/main/div/div/div/div[2]/div[2]/div/a[1]/h4/text()", "//div/div[@class=\"product-card css-e8at8d eag3qlw10\"]//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()", "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[2]/div//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()", "//div[@class=\"related-products css-1rinft1 emlf3670\"]/div//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()", "//div/div[@class=\"product-card css-e8at8d eag3qlw10\"]//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()", "//html[@lang=\"en\"]/body/div/main/div/div/div/div[2]/div[2]/div//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()", "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[2]/div//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()", "//div[@class=\"related-products css-1rinft1 emlf3670\"]/div//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()", "//html[@lang=\"en\"]/body/div/main/div/div/div/div[2]/div[2]/div//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()" ] }, { "_fn": "regex_search", "_args": [ "^\\s*(.[\\s\\S]*?)\\s*$", 1 ] } ] } } ``` #### 解析数据 ```json { "price": 91.99, "product_title": "The Legend of Zelda: Ocarina of Time", "related_products": [ "The Legend of Zelda: Majora's Mask", "Indiana Jones and the Infernal Machine" ], "parse_status_code": 12000 } ``` ### 通过 API 生成解析说明如果你想生成大量不同的解析说明集，以覆盖你所处理的各种网站，你可以通过 API 构建解析说明——查看 [解析说明生成器 API](/products/cn/web-scraper-api/features/custom-parser/generating-parsing-instructions-via-api.md) 了解其实现方式。 ## 浏览器指令构建器你可以使用 OxyCopilot 构建复杂的页面交互脚本，而无需分析站点结构或手动为你的 [浏览器指令](/products/cn/web-scraper-api/features/js-rendering-and-browser-control.md). ### 工作原理 #### **步骤 1：提供 URL 和提示** * **URL：** 请提供一个单独的 URL，用于生成浏览器指令。OxyCopilot 会使用所提供 URL 的 HTML 来确定如何编写你所需的网页交互脚本。 * **提示：** 提示对于构建浏览器指令至关重要。请清楚说明页面打开后希望执行哪些操作（例如：“滚动到页面底部，等待“下一页”按钮加载，点击“下一页”按钮”）。

#### **步骤 2 \[可选]：调整浏览器指令** 此步骤可让你微调浏览器指令序列，以更好地满足需求或排查问题。 #### **浏览器指令概览**

OxyCopilot 完成处理你的输入后，会显示其创建的浏览器指令序列。你可以通过编辑、添加或删除步骤来调整该序列。 #### **步骤 3：复制/保存说明并集成到抓取任务中** 一旦说明令人满意，你可以将说明保存到你的网页爬虫API Playground 会话中，调整其他请求参数，进行测试，然后以你偏好的编程语言复制完整的请求代码。

{% hint style="success" %} 我们欢迎你的反馈和改进建议。请随时通过以下方式联系我们：，或与我们的 24/7 在线聊天支持联系。 {% endhint %} --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://developers.oxylabs.io/products/cn/web-scraper-api/web-scraper-api-playground/oxycopilot.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.