> For the complete documentation index, see [llms.txt](https://developers.oxylabs.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://developers.oxylabs.io/products/cn/web-scraper-api/features/custom-parser/writing-instructions-manually/list-of-functions/function-examples.md).

# 解析函数示例

## HTML 处理

### `element_text`

#### 示例 HTML

```html
<!DOCTYPE html>
<html>
<body>
    <div id="product">
        <div id="product-description">This is a nice product</div>
        <div id="product-price">    12  3


        </div>
    </div>
</body>
</html>
```

**从 HTML 元素中提取文本并去除空白**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//*[@id='product-price']"]
            },
            {
                "_fn": "element_text"
            }
        ]
    }
}
```

```json
{
    "price": "12  3"
}
```

**给定一个字符串值作为输入，什么也不做**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//*[@id='product-price']/text()"]
            },
            {
                "_fn": "element_text"
            }
        ]
    }
}
```

```json
{
    "price": "    12  3\n\n\n        "
}
```

### `xpath`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">Socks</div>
        <div class="price">123.12</div>
        <div class="description">
            <ul>
                <li class="description-item">Very</li>
                <li class="description-item">Nice</li>
                <li class="description-item">Socks</li>
            </ul>
        </div>
    </div>
</body>
```

**获取所有描述项**

```json
{
    "description_items": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//li[@class='description-item']/text()"]
            }
        ]
    }
}
```

```json
{
    "description_items": ["Very", "Nice", "Socks"]
}
```

**获取第一个描述项**

```json
{
    "first_description_item": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["(//li[@class='description-item'])[1]/text()"]
            }
        ]
    }
}
```

```json
{
    "first_description_item": [
        "Very"
    ]
}
```

**检查描述区元素是否存在**

```json
{
    "description_section_exists": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["boolean(//div[@class='description'])"]
            }
        ]
    }
}
```

```json
{
    "description_section_exists": true
}
```

**将价格作为数字获取**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["number(//div[@class='price'])"]
            }
        ]
    }
}
```

```json
{
    "description_section_exists": 123.12
}
```

**当前表达式失败时可回退到的多个表达式**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [
                    "//div[@class='product-price']/text()", <--- this does not find anything
                    "//div[@class='price']/text()" <--- this finds the target price
                ]
            }
        ]
    }
}
```

```json
{
    "price": [
        "123.12"
    ]
}
```

**XPath `|` 用于匹配多个表达式的操作符**

```json
{
    "price_and_title": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//div[@class='price']/text() | //div[@class='title']/text()"]
            }
        ]
    }
}
```

```json
{
    "price_and_title": [
        "Socks",
        "123.12"
    ]
}
```

### `xpath_one`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">Socks</div>
        <div class="price">123.12</div>
        <div class="description">
            <ul>
                <li class="description-item">Very</li>
                <li class="description-item">Nice</li>
                <li class="description-item">Socks</li>
            </ul>
        </div>
    </div>
</body>
```

**返回第一个匹配项**

```json
{
    "first_description_item": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//li/text()"]
            }
        ]
    }
}
```

```json
{
    "first_description_item": "Very"
}
```

**使用 XSLT 函数**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": ["number(.//div[@class='price'])"]
            }
        ]
    }
}
```

```json
{
    "price": 123.12
}
```

## 字符串操作

### `amount_from_string`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">Socks</div>
        <div class="price">价格是：123.12 比索</div>
    </div>
</body>
```

**从字符串中提取金额**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='price']/text()"]
            },
            {
                "_fn": "amount_from_string"
            }
        ]
    }
}
```

```json
{
    "price": 123.12
}
```

### `amount_range_from_string`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">
            价格是：123.12 比索；
            价格是：345.12 比索；
            价格是：678.12 比索
        </div>
    </div>
</body>
```

**从字符串中提取所有金额**

```json
{
    "prices": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='price']/text()"]
            },
            {
                "_fn": "amount_range_from_string"
            }
        ]
    }
}
```

```json
{    
    "prices": [
        123.12,
        345.12,
        678.12
    ]
}
```

### `join`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">
            价格是：123.12 比索；
        </div>
        <div class="price">
            价格是：345.12 比索；
        </div>
        <div class="price">
            价格是：678.12 比索
        </div>
    </div>
</body>
```

**将字符串数组合并为单个字符串**

```json
{
    "price_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {  // 如果我们在第一个流水线函数中调用 normalize-space()， 
               // 它将只返回第一个值。
                "_fn": "xpath",
                "_args": ["normalize-space(text())"]
            },  
            {
                "_fn": "join",
                "_args": ""
            }
        ]
    }
}
```

```json
{
    "price_variants": "价格是：123.12 比索；价格是：345.12 比索；价格是：678.12 比索"
}
```

### `regex_find_all` <a href="#regex_find_all" id="regex_find_all"></a>

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="description">
            [一个描述]
            [两个描述]
            [三个描述]
        </div>
    </div>
</body>
```

**查找两个字符之间的所有匹配项**

```json
{
    "descriptions": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='description']/text()"]
            },
            {
                "_fn": "regex_find_all",
                "_args": ["\\[(.*)\\]"]
            }
        ]
    }
}
```

```json
{
    "descriptions": [
        "一个描述",
        "两个描述",
        "三个描述"
    ]
}
```

### `regex_search` <a href="#regex_search" id="regex_search"></a>

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="description">
            [一个描述]
            [两个描述]
            [三个描述]
            {我需要的那个}
        </div>
    </div>
</body>
```

**返回两个字符之间的描述**

```json
{
    "description": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='description']/text()"]
            },
            {
                "_fn": "regex_search",
                "_args": ["{(.*)}", 1]
            }
        ]
    }
}
```

```json
{
    "description": "我需要的那个"
}
```

### `regex_substring`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="description">
            * 一个描述
            * 两个描述
            * 三个描述
            * {我想替换成这个}
        </div>
    </div>
</body>
```

**用指定值替换文本的一部分**

```json
{
    "descriptions": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='description']/text()"]
            },
            {
                "_fn": "regex_substring",
                "_args": ["{我想替换成这个}", "四个描述"]
            },
            {
                "_fn": "regex_find_all",
                "_args": ["\\*\\s(.*)\n"]
            }
        ]
    }
}
```

```json
{
    "descriptions": [
        "一个描述",
        "两个描述",
        "three description",
        "four description"
    ]
}
```

## 通用函数

### `convert_to_*`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**获取价格变体的数量**

```json
{
    "price_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "length"
            }
        ]
    }
}
```

```json
{
    "price_variants": 5
}
```

**获取多维数组中价格变体的数量**

示例 HTML：

```html
<body>
    <div class="product">
        <property class="colors">
            <option class="color">Red</option>
            <option class="color">Green</option>
            <option class="color">Blue</option>
        </property>
        <property class="sizes">
            <option class="size">S</option>
            <option class="size">M</option>
            <option class="size">L</option>
            <option class="size">XL</option>
        </property>
    </div>
</body>
```

```json
{
    "number_of_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//property"]
            },
            {
                "_fn": "xpath",
                "_args": [".//option"]
            },
            {
                "_fn": "length"
            }
        ]
    }
}
```

```json
{
    "number_of_variants": [
        3,
        3
    ]
}
```

### `select_nth`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">Socks</div>
        <div class="price">123.12</div>
        <div class="description">
            <ul>
                <li class="description-item">Very</li>
                <li class="description-item">Nice</li>
                <li class="description-item">Socks</li>
            </ul>
        </div>
    </div>
</body>
```

**从数组中选择第一个描述项**

```json
{
    "price_and_title": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//li[@class='description-item']/text()"]
            },
            {
                "_fn": "select_nth",
                "_args": 0
            }
        ]
    }
}
```

```json
{
    "price_and_title": "Very"
}
```

**从数组中选择最后一个描述项**

```json
{
    "price_and_title": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//li[@class='description-item']/text()"]
            },
            {
                "_fn": "select_nth",
                "_args": -1
            }
        ]
    }
}
```

```json
{
    "price_and_title": "Socks"
}
```

## 数学函数

### `average`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**求所有列出价格的平均值**

```json
{
    "price_average": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "xpath_one",
                "_args": ["number(text())"]
            },
            {
                "_fn": "average"
            }
        ]
    }
}
```

```json
{
    "price_average": 244.8
}
```

### `max`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**求所有列出价格的最大值**

```json
{
    "price_max": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "xpath_one",
                "_args": ["number(text())"]
            },
            {
                "_fn": "max"
            }
        ]
    }
}
```

```json
{
    "price_max": 456.0
}
```

### `min`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**求所有列出价格的平均值**

```json
{
    "price_min": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "xpath_one",
                "_args": ["number(text())"]
            },
            {
                "_fn": "min"
            }
        ]
    }
}
```

```json
{
    "price_min": 100.0
}
```

### `product`

#### 示例 HTML

```html
<body>
    <div class="product">
        <property class="colors">
            <option class="color">Red</option>
            <option class="color">Green</option>
            <option class="color">Blue</option>
        </property>
        <property class="sizes">
            <option class="size">S</option>
            <option class="size">M</option>
            <option class="size">L</option>
            <option class="size">XL</option>
        </property>
    </div>
</body>
```

**获取不同产品变体的数量**

```json
{
    "number_of_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//property"]
            },
            {
                "_fn": "xpath",
                "_args": [".//option"]
            },
            {
                "_fn": "length"
            },
            {
                "_fn": "product"
            }
        ]
    }
}
```

```json
{
    "number_of_variants": 12
}
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/products/cn/web-scraper-api/features/custom-parser/writing-instructions-manually/list-of-functions/function-examples.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
