> For the complete documentation index, see [llms.txt](https://developers.oxylabs.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://developers.oxylabs.io/products/cn/web-scraper-api/features/custom-parser.md).

# 自定义解析器

自定义解析器是网页爬虫API的一项免费功能，让您能够 **创建解析和数据处理逻辑** 该逻辑会在原始 HTML 结果上执行。您可以使用 AI 自动生成解析器，也可以针对高级场景手动编写。

有关详细说明和示例，请参阅以下页面：

<a href="/pages/9a3d539e7682a61637af93b46b5fa6f5d60476b6" class="button secondary" data-icon="flag-checkered">开始使用</a>  <a href="/pages/b093a36bab08203ddafe804951101a0956b1f771" class="button secondary" data-icon="brain-circuit">通过 API 生成解析器</a>  <a href="/pages/1e3db72f0477ab03a936eb1ec00e8a6ea5e7c930" class="button secondary" data-icon="layer-group">解析器预设</a>

<a href="/pages/b24e6794fffe58ec1b514c05c9ed8e72106ba036" class="button secondary" data-icon="code">手动编写指令</a>  <a href="/pages/b210c7fde99e3f92815c49348b7c1d7ea1c4d44c" class="button secondary" data-icon="list-ul">解析函数列表</a>

***

## 快速开始

### 1. 生成解析器

我们建议您从我们的 AI 驱动 [**OxyCopilot**](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/web-scraper-api-playground/oxycopilot) 工具开始，它让您无需编写任何代码即可生成爬虫和解析器。/bu

{% hint style="success" %}
要访问 OxyCopilot，请登录到 [**Oxylabs 仪表板**](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/web-scraper-api-playground/oxycopilot) 并选择 **Web Scraper API Playground** 左侧菜单中的
{% endhint %}

按照视频中显示的步骤来 **生成解析器**:

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2FMv1sqaKQeb6ZUqst9Ehp%2Fgenerate_parser.mp4?alt=media&token=9e35fa02-842d-48da-bb52-4e2c7f9d186e>" %}

以下是视频中展示的相同步骤：

1. **输入 URL** 即您想要抓取和解析的
2. **指定任何参数** 例如 JavaScript 渲染
3. **编写提示词** 描述您想要解析的内容
4. **运行** OxyCopilot

当您对生成的解析器满意后，加载指令。

### 2. 将解析器保存为预设

您可以通过 OxyCopilot 轻松保存生成的解析器以供后续使用。请参阅以下步骤：

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2FrZw97isbhLa2Du9V5oKd%2Fsave_preset.mp4?alt=media&token=d7e9c4b5-755c-4175-9cb5-83c29ec37810>" %}

1. **将预设分配** 给特定的 API 用户
2. 点击 **保存**
3. **输入预设名称** 以及说明（可选）

保存预设后，您可以在 API 请求中使用它。

### 3. 在 API 请求中使用解析器

要在网页爬虫API中使用您的预设，请发送一个负载，并将 `parser_preset` 参数设置为您的预设名称。在下面的代码示例中，我们将复用 `example_parser` 这是在前面步骤中创建的预设。

{% tabs %}
{% tab title="cURL" %}

```shell
curl 'https://realtime.oxylabs.io/v1/queries' \
--user 'USERNAME:PASSWORD' \
-H 'Content-Type: application/json' \
-d '{
        "source": "universal",
        "url": "https://example.com/",
        "parse": true,
        "parser_preset": "example_parser"
    }'
```

{% endtab %}

{% tab title="Python" %}

```python
import requests
from pprint import pprint


# Set the parser preset to use.
payload = {
    'source': 'universal',
    'url': 'https://example.com/',
    'parse': True,
    'parser_preset': 'example_parser'
}

# Get a response.
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('USERNAME', 'PASSWORD'),
    json=payload
)

# Print prettified response to stdout.
pprint(response.json())
```

{% endtab %}

{% tab title="Node.js" %}

```javascript
const https = require("https");

const username = "USERNAME";
const password = "PASSWORD";
const body = {
    source: "universal",
    url: "https://example.com/",
    parse: true,
    parser_preset: "example_parser"
};

const options = {
    hostname: "realtime.oxylabs.io",
    path: "/v1/queries",
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        Authorization:
            "Basic " + Buffer.from(`${username}:${password}`).toString("base64"),
    },
};

const request = https.request(options, (response) => {
    let data = "";

    response.on("data", (chunk) => {
        data += chunk;
    });

    response.on("end", () => {
        const responseData = JSON.parse(data);
        console.log(JSON.stringify(responseData, null, 2));
    });
});

request.on("error", (error) => {
    console.error("错误:", error);
});

request.write(JSON.stringify(body));
request.end();
```

{% endtab %}

{% tab title="HTTP" %}

```http
# 你提交的整个字符串必须进行 URL 编码。

https://realtime.oxylabs.io/v1/queries?source=universal&url=https%3A%2F%2Fexample.com%2F&parse=true&parser_preset=example_parser&access_token=12345abcde
```

{% endtab %}

{% tab title="PHP" %}

```php
<?php

$params = array(
    'source' => 'universal',
    'url' => 'https://example.com/',
    'parse' => true,
    'parser_preset' => 'example_parser'
);

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "https://realtime.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "USERNAME" . ":" . "PASSWORD");

$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$result = curl_exec($ch);
echo $result;

if (curl_errno($ch)) {
    echo 'Error:' . curl_error($ch);
}
curl_close($ch);
```

{% endtab %}

{% tab title="Golang" %}

```go
package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io/ioutil"
	"net/http"
)

func main() {
	const Username = "USERNAME"
	const Password = "PASSWORD"

	payload := map[string]interface{}{
		"source": "universal",
		"url": "https://example.com/",
		"parse": true,
		"parser_preset": "example_parser",
	}

	jsonValue, _ := json.Marshal(payload)

	client := &http.Client{}
	request, _ := http.NewRequest("POST",
		"https://realtime.oxylabs.io/v1/queries",
		bytes.NewBuffer(jsonValue),
	)

	request.SetBasicAuth(Username, Password)
	response, _ := client.Do(request)

	responseText, _ := ioutil.ReadAll(response.Body)
	fmt.Println(string(responseText))
}

```

{% endtab %}

{% tab title="C#" %}

```csharp
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Net.Http.Json;
using System.Threading.Tasks;

namespace OxyApi
{
    class Program
    {
        static async Task Main()
        {
            const string Username = "USERNAME";
            const string Password = "PASSWORD";

            var parameters = new {
                source = "universal",
                url = "https://example.com/",
                parse = true,
                parser_preset = "example_parser"
            };

            var client = new HttpClient();

            Uri baseUri = new Uri("https://realtime.oxylabs.io");
            client.BaseAddress = baseUri;

            var requestMessage = new HttpRequestMessage(HttpMethod.Post, "/v1/queries");
            requestMessage.Content = JsonContent.Create(parameters);

            var authenticationString = $"{Username}:{Password}";
            var base64EncodedAuthenticationString = Convert.ToBase64String(System.Text.ASCIIEncoding.UTF8.GetBytes(authenticationString));
            requestMessage.Headers.Add("Authorization", "Basic " + base64EncodedAuthenticationString);

            var response = await client.SendAsync(requestMessage);
            var contents = await response.Content.ReadAsStringAsync();

            Console.WriteLine(contents);
        }
    }
}
```

{% endtab %}

{% tab title="Java" %}

```java
package org.example;

import okhttp3.*;
import org.json.JSONObject;
import java.util.concurrent.TimeUnit;

public class Main implements Runnable {
    private static final String AUTHORIZATION_HEADER = "Authorization";
    public static final String USERNAME = "USERNAME";
    public static final String PASSWORD = "PASSWORD";

    public void run() {
        JSONObject jsonObject = new JSONObject();
        jsonObject.put("source", "universal");
        jsonObject.put("url", "https://example.com/");
        jsonObject.put("parse", true);
        jsonObject.put("parser_preset", "example_parser");

        Authenticator authenticator = (route, response) -> {
            String credential = Credentials.basic(USERNAME, PASSWORD);
            return response
                    .request()
                    .newBuilder()
                    .header(AUTHORIZATION_HEADER, credential)
                    .build();
        };

        var client = new OkHttpClient.Builder()
                .authenticator(authenticator)
                .readTimeout(180, TimeUnit.SECONDS)
                .build();

        var mediaType = MediaType.parse("application/json; charset=utf-8");
        var body = RequestBody.create(jsonObject.toString(), mediaType);
        var request = new Request.Builder()
                .url("https://realtime.oxylabs.io/v1/queries")
                .post(body)
                .build();

        try (var response = client.newCall(request).execute()) {
            if (response.body() != null) {
                try (var responseBody = response.body()) {
                    System.out.println(responseBody.string());
                }
            }
        } catch (Exception exception) {
            System.out.println("Error: " + exception.getMessage());
        }

        System.exit(0);
    }

    public static void main(String[] args) {
        new Thread(new Main()).start();
    }
}
```

{% endtab %}

{% tab title="JSON" %}

```json
{
    "source": "universal",
    "url": "https://example.com/",
    "parse": true,
    "parser_preset": "example_parser"
}
```

{% endtab %}
{% endtabs %}

<details>

<summary>输出示例</summary>

```json
{
  "results": [
    {
      "content": {
        "title": "Example Domain",
        "parse_status_code": 12000
      },
      "created_at": "2025-10-24 10:04:59",
      "updated_at": "2025-10-24 10:05:00",
      "page": 1,
      "url": "https://example.com/",
      "job_id": "7387428891226308609",
      "is_render_forced": false,
      "status_code": 200,
      "type": "parsed",
      "parser_type": "preset",
      "parser_preset": "example_parser"
    }
  ]
}
```

</details>

## 获取已解析任务的 HTML 内容

您也可以通过添加 `?type=raw` 到结果获取 URL 的末尾来检索原始 HTML 结果。阅读更多 [**此处**](/products/cn/web-scraper-api/integration-methods/push-pull.md#endpoints).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/products/cn/web-scraper-api/features/custom-parser.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
