# 自定义解析器

自定义解析器是 网页爬虫API 的一项免费功能，可让你 **创建解析和数据处理逻辑** 该逻辑会在原始 HTML 结果上执行。你可以使用 AI 自动生成解析器，或者在高级场景中手动编写它们。

如需详细说明和示例，请参阅以下页面：

<a href="/pages/9a3d539e7682a61637af93b46b5fa6f5d60476b6" class="button secondary" data-icon="flag-checkered">入门</a>  <a href="/pages/b093a36bab08203ddafe804951101a0956b1f771" class="button secondary" data-icon="brain-circuit">通过 API 生成解析器</a>  <a href="/pages/1e3db72f0477ab03a936eb1ec00e8a6ea5e7c930" class="button secondary" data-icon="layer-group">解析器预设</a>

<a href="/pages/b24e6794fffe58ec1b514c05c9ed8e72106ba036" class="button secondary" data-icon="code">手动编写说明</a>  <a href="/pages/b210c7fde99e3f92815c49348b7c1d7ea1c4d44c" class="button secondary" data-icon="list-ul">解析函数列表</a>

***

## 快速开始

### 1. 生成解析器

我们建议先从我们的 AI 驱动的 [**OxyCopilot**](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/web-scraper-api-playground/oxycopilot) 工具开始，它可让你无需编写任何代码即可生成爬虫和解析器。/bu

{% hint style="success" %}
要访问 OxyCopilot，请登录 [**Oxylabs 控制台**](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/web-scraper-api-playground/oxycopilot) 并选择 **Web Scraper API Playground** 左侧菜单中的。
{% endhint %}

按照视频中显示的步骤， **生成解析器**:

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2FMv1sqaKQeb6ZUqst9Ehp%2Fgenerate_parser.mp4?alt=media&token=9e35fa02-842d-48da-bb52-4e2c7f9d186e>" %}

以下是视频中展示的相同步骤：

1. **输入 URL(s)** 你想抓取和解析
2. **指定任何参数** 例如 JavaScript 渲染
3. **编写一个提示词** 来描述你想解析的内容
4. **运行** OxyCopilot

一旦你对生成的解析器满意，就加载说明。

### 2. 将解析器保存为预设

你可以通过 OxyCopilot 轻松保存生成的解析器，以便日后使用。请参见以下步骤：

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2FrZw97isbhLa2Du9V5oKd%2Fsave_preset.mp4?alt=media&token=d7e9c4b5-755c-4175-9cb5-83c29ec37810>" %}

1. **分配预设** 针对特定的 API 用户
2. 点击 **保存**
3. **输入预设名称** 以及描述（可选）

保存预设后，您可以在 API 请求中使用它。

### 3. 在 API 请求中使用解析器

要将您的预设与网页爬虫API一起使用，请发送一个带有以下内容的有效载荷： `parser_preset` 参数设为您的预设名称。在下面的代码示例中，我们复用了 `example_parser` 在前面步骤中创建的预设。

{% tabs %}
{% tab title="cURL" %}

```shell
curl 'https://realtime.oxylabs.io/v1/queries' \
--user 'USERNAME:PASSWORD' \
-H 'Content-Type: application/json' \
-d '{
        "source": "universal",
        "url": "https://example.com/",
        "parse": true,
        "parser_preset": "example_parser"
    }'
```

{% endtab %}

{% tab title="Python" %}

```python
import requests
from pprint import pprint


# 设置要使用的解析器预设。
payload = {
    'source': 'universal',
    'url': 'https://example.com/',
    'parse': True,
    'parser_preset': 'example_parser'
}

# 获取响应。
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('USERNAME', 'PASSWORD'),
    json=payload
)

# 将格式化后的响应打印到标准输出。
pprint(response.json())
```

{% endtab %}

{% tab title="Node.js" %}

```javascript
const https = require("https");

const username = "USERNAME";
const password = "PASSWORD";
const body = {
    source: "universal",
    url: "https://example.com/",
    parse: true,
    parser_preset: "example_parser"
};

const options = {
    hostname: "realtime.oxylabs.io",
    path: "/v1/queries",
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        Authorization:
            "Basic " + Buffer.from(`${username}:${password}`).toString("base64"),
    },
};

const request = https.request(options, (response) => {
    let data = "";

    response.on("data", (chunk) => {
        data += chunk;
    });

    response.on("end", () => {
        const responseData = JSON.parse(data);
        console.log(JSON.stringify(responseData, null, 2));
    });
});

request.on("error", (error) => {
    console.error("错误：", error);
});

request.write(JSON.stringify(body));
request.end();
```

{% endtab %}

{% tab title="HTTP" %}

```http
# 您提交的整个字符串必须经过 URL 编码。

https://Realtime.oxylabs.io/v1/queries?source=universal&url=https%3A%2F%2Fexample.com%2F&parse=true&parser_preset=example_parser&access_token=12345abcde
```

{% endtab %}

{% tab title="PHP" %}

```php
<?php

$params = array(
    'source' => 'universal',
    'url' => 'https://example.com/',
    'parse' => true,
    'parser_preset' => 'example_parser'
);

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "https://realtime.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "USERNAME" . ":" . "PASSWORD");

$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$result = curl_exec($ch);
echo $result;

if (curl_errno($ch)) {
    echo 'Error:' . curl_error($ch);
}
curl_close($ch);
```

{% endtab %}

{% tab title="Golang" %}

```go
package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io/ioutil"
	"net/http"
)

func main() {
	const 用户名 = "USERNAME"
	const Password = "PASSWORD"

	payload := map[string]interface{}{
		"source": "universal",
		"url": "https://example.com/",
		"parse": true,
		"parser_preset": "example_parser",
	}

	jsonValue, _ := json.Marshal(payload)

	client := &http.Client{}
	request, _ := http.NewRequest("POST",
		"https://realtime.oxylabs.io/v1/queries",
		bytes.NewBuffer(jsonValue),
	)

	request.SetBasicAuth(Username, Password)
	response, _ := client.Do(request)

	responseText, _ := ioutil.ReadAll(response.Body)
	fmt.Println(string(responseText))
}

```

{% endtab %}

{% tab title="C#" %}

```csharp
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Net.Http.Json;
using System.Threading.Tasks;

namespace OxyApi
{
    class Program
    {
        static async Task Main()
        {
            const string Username = "USERNAME";
            const string Password = "PASSWORD";

            var parameters = new {
                source = "universal",
                url = "https://example.com/",
                parse = true,
                parser_preset = "example_parser"
            };

            var client = new HttpClient();

            Uri baseUri = new Uri("https://realtime.oxylabs.io");
            client.BaseAddress = baseUri;

            var requestMessage = new HttpRequestMessage(HttpMethod.Post, "/v1/queries");
            requestMessage.Content = JsonContent.Create(parameters);

            var authenticationString = $"{Username}:{Password}";
            var base64EncodedAuthenticationString = Convert.ToBase64String(System.Text.ASCIIEncoding.UTF8.GetBytes(authenticationString));
            requestMessage.Headers.Add("Authorization", "Basic " + base64EncodedAuthenticationString);

            var response = await client.SendAsync(requestMessage);
            var contents = await response.Content.ReadAsStringAsync();

            Console.WriteLine(contents);
        }
    }
}
```

{% endtab %}

{% tab title="Java" %}

```java
package org.example;

import okhttp3.*;
import org.json.JSONObject;
import java.util.concurrent.TimeUnit;

public class Main implements Runnable {
    private static final String AUTHORIZATION_HEADER = "Authorization";
    public static final String USERNAME = "USERNAME";
    public static final String PASSWORD = "PASSWORD";

    public void run() {
        JSONObject jsonObject = new JSONObject();
        jsonObject.put("source", "universal");
        jsonObject.put("url", "https://example.com/");
        jsonObject.put("parse", true);
        jsonObject.put("parser_preset", "example_parser");

        Authenticator authenticator = (route, response) -> {
            String credential = Credentials.basic(USERNAME, PASSWORD);
            return response
                    .request()
                    .newBuilder()
                    .header(AUTHORIZATION_HEADER, credential)
                    .build();
        };

        var client = new OkHttpClient.Builder()
                .authenticator(authenticator)
                .readTimeout(180, TimeUnit.SECONDS)
                .build();

        var mediaType = MediaType.parse("application/json; charset=utf-8");
        var body = RequestBody.create(jsonObject.toString(), mediaType);
        var request = new Request.Builder()
                .url("https://realtime.oxylabs.io/v1/queries")
                .post(body)
                .build();

        try (var response = client.newCall(request).execute()) {
            if (response.body() != null) {
                try (var responseBody = response.body()) {
                    System.out.println(responseBody.string());
                }
            }
        } catch (Exception exception) {
            System.out.println("Error: " + exception.getMessage());
        }

        System.exit(0);
    }

    public static void main(String[] args) {
        new Thread(new Main()).start();
    }
}
```

{% endtab %}

{% tab title="JSON" %}

```json
{
    "source": "universal",
    "url": "https://example.com/",
    "parse": true,
    "parser_preset": "example_parser"
}
```

{% endtab %}
{% endtabs %}

<details>

<summary>输出样本</summary>

```json
{
  "results": [
    {
      "content": {
        "title": "示例域名",
        "解析状态码": 12000
      },
      "created_at": "2025-10-24 10:04:59",
      "updated_at": "2025-10-24 10:05:00",
      "page": 1,
      "url": "https://example.com/",
      "job_id": "7387428891226308609",
      "is_render_forced": false,
      "status_code": 200,
      "类型": "parsed",
      "parser_type": "preset",
      "parser_preset": "example_parser"
    }
  ]
}
```

</details>

## 获取已解析任务的 HTML 内容

您还可以通过添加 `?type=raw` 到结果检索 URL 的末尾来获取原始 HTML 结果。了解更多 [**这里**](/products/cn/web-scraper-api/integration-methods/push-pull.md#endpoints).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/products/cn/web-scraper-api/features/custom-parser.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
