# 解析函数示例

## HTML 处理

### `element_text`

#### 示例 HTML

```html
<!DOCTYPE html>
<html>
<body>
    <div id="product">
        <div id="product-description">这是一个不错的产品</div>
        <div id="product-price">    12  3


        </div>
    </div>
</body>
</html>
```

**从 HTML 元素中提取文本并去除空白字符**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//*[@id='product-price']"]
            },
            {
                "_fn": "element_text"
            }
        ]
    }
}
```

```json
{
    "price": "12  3"
}
```

**给定一个字符串值作为输入，什么也不做**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//*[@id='product-price']/text()"]
            },
            {
                "_fn": "element_text"
            }
        ]
    }
}
```

```json
{
    "price": "    12  3\n\n\n        "
}
```

### `xpath`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">袜子</div>
        <div class="price">123.12</div>
        <div class="description">
            <ul>
                <li class="description-item">非常</li>
                <li class="description-item">不错</li>
                <li class="description-item">袜子</li>
            </ul>
        </div>
    </div>
</body>
```

**获取所有描述项**

```json
{
    "description_items": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//li[@class='description-item']/text()"]
            }
        ]
    }
}
```

```json
{
    "description_items": ["非常", "不错", "袜子"]
}
```

**获取第一项描述**

```json
{
    "first_description_item": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["(//li[@class='description-item'])[1]/text()"]
            }
        ]
    }
}
```

```json
{
    "first_description_item": [
        "非常"
    ]
}
```

**检查描述部分元素是否存在**

```json
{
    "description_section_exists": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["boolean(//div[@class='description'])"]
            }
        ]
    }
}
```

```json
{
    "description_section_exists": true
}
```

**将价格作为数字获取**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["number(//div[@class='price'])"]
            }
        ]
    }
}
```

```json
{
    "description_section_exists": 123.12
}
```

**当前面的表达式失败时，作为回退的多个表达式**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [
                    "//div[@class='product-price']/text()", <--- 这找不到任何内容
                    "//div[@class='price']/text()" <--- 这找到了目标价格
                ]
            }
        ]
    }
}
```

```json
{
    "price": [
        "123.12"
    ]
}
```

**XPath `|` 用于匹配多个表达式的运算符**

```json
{
    "price_and_title": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//div[@class='price']/text() | //div[@class='title']/text()"]
            }
        ]
    }
}
```

```json
{
    "price_and_title": [
        "袜子",
        "123.12"
    ]
}
```

### `xpath_one`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">袜子</div>
        <div class="price">123.12</div>
        <div class="description">
            <ul>
                <li class="description-item">非常</li>
                <li class="description-item">不错</li>
                <li class="description-item">袜子</li>
            </ul>
        </div>
    </div>
</body>
```

**返回第一个匹配项**

```json
{
    "first_description_item": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//li/text()"]
            }
        ]
    }
}
```

```json
{
    "first_description_item": "非常"
}
```

**使用 XSLT 函数**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": ["number(.//div[@class='price'])"]
            }
        ]
    }
}
```

```json
{
    "price": 123.12
}
```

## 字符串操作

### `amount_from_string`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">袜子</div>
        <div class="price">价格是：123.12 比索</div>
    </div>
</body>
```

**从字符串中提取金额**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='price']/text()"]
            },
            {
                "_fn": "amount_from_string"
            }
        ]
    }
}
```

```json
{
    "price": 123.12
}
```

### `amount_range_from_string`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">
            价格是：123.12 比索；
            价格是：345.12 比索；
            价格是：678.12 比索
        </div>
    </div>
</body>
```

**从字符串中提取所有金额**

```json
{
    "prices": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='price']/text()"]
            },
            {
                "_fn": "amount_range_from_string"
            }
        ]
    }
}
```

```json
{    
    "prices": [
        123.12,
        345.12,
        678.12
    ]
}
```

### `join`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">
            价格是：123.12 比索；
        </div>
        <div class="price">
            价格是：345.12 比索；
        </div>
        <div class="price">
            价格是：678.12 比索
        </div>
    </div>
</body>
```

**将字符串数组连接成一个字符串**

```json
{
    "price_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {  // 如果我们在第一个管道函数中调用 normalize-space()， 
               // 它将只返回第一个值。
                "_fn": "xpath",
                "_args": ["normalize-space(text())"]
            },  
            {
                "_fn": "join",
                "_args": ""
            }
        ]
    }
}
```

```json
{
    "price_variants": "价格是：123.12 比索；价格是：345.12 比索；价格是：678.12 比索"
}
```

### `regex_find_all` <a href="#regex_find_all" id="regex_find_all"></a>

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="description">
            [一个描述]
            [两个描述]
            [三个描述]
        </div>
    </div>
</body>
```

**查找两个字符之间的所有匹配项**

```json
{
    "descriptions": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='description']/text()"]
            },
            {
                "_fn": "regex_find_all",
                "_args": ["\\[(.*)\\]"]
            }
        ]
    }
}
```

```json
{
    "descriptions": [
        "一个描述",
        "两个描述",
        "三个描述"
    ]
}
```

### `regex_search` <a href="#regex_search" id="regex_search"></a>

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="description">
            [一个描述]
            [两个描述]
            [三个描述]
            {我需要的那个}
        </div>
    </div>
</body>
```

**返回两个字符之间的描述**

```json
{
    "description": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='description']/text()"]
            },
            {
                "_fn": "regex_search",
                "_args": ["{(.*)}", 1]
            }
        ]
    }
}
```

```json
{
    "description": "我需要的那个"
}
```

### `regex_substring`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="description">
            * 一个描述
            * 两个描述
            * 三个描述
            * {我想替换的这个}
        </div>
    </div>
</body>
```

**用指定值替换一部分文本**

```json
{
    "descriptions": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='description']/text()"]
            },
            {
                "_fn": "regex_substring",
                "_args": ["{我想替换的这个}", "四个描述"]
            },
            {
                "_fn": "regex_find_all",
                "_args": ["\\*\\s(.*)\n"]
            }
        ]
    }
}
```

```json
{
    "descriptions": [
        "一个描述",
        "两个描述",
        "三个描述",
        "四个描述"
    ]
}
```

## 常用函数

### `convert_to_*`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**获取价格变体的数量**

```json
{
    "price_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "length"
            }
        ]
    }
}
```

```json
{
    "price_variants": 5
}
```

**获取多维数组中价格变体的数量**

示例 HTML：

```html
<body>
    <div class="product">
        <property class="colors">
            <option class="color">红色</option>
            <option class="color">绿色</option>
            <option class="color">蓝色</option>
        </property>
        <property class="sizes">
            <option class="size">S</option>
            <option class="size">M</option>
            <option class="size">L</option>
            <option class="size">XL</option>
        </property>
    </div>
</body>
```

```json
{
    "number_of_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//property"]
            },
            {
                "_fn": "xpath",
                "_args": [".//option"]
            },
            {
                "_fn": "length"
            }
        ]
    }
}
```

```json
{
    "number_of_variants": [
        3,
        3
    ]
}
```

### `select_nth`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">袜子</div>
        <div class="price">123.12</div>
        <div class="description">
            <ul>
                <li class="description-item">非常</li>
                <li class="description-item">不错</li>
                <li class="description-item">袜子</li>
            </ul>
        </div>
    </div>
</body>
```

**从数组中选择第一个描述项**

```json
{
    "price_and_title": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//li[@class='description-item']/text()"]
            },
            {
                "_fn": "select_nth",
                "_args": 0
            }
        ]
    }
}
```

```json
{
    "price_and_title": "非常"
}
```

**从数组中选择最后一个描述项**

```json
{
    "price_and_title": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//li[@class='description-item']/text()"]
            },
            {
                "_fn": "select_nth",
                "_args": -1
            }
        ]
    }
}
```

```json
{
    "price_and_title": "袜子"
}
```

## 数学函数

### `average`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**求所有列出价格的平均值**

```json
{
    "price_average": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "xpath_one",
                "_args": ["number(text())"]
            },
            {
                "_fn": "average"
            }
        ]
    }
}
```

```json
{
    "price_average": 244.8
}
```

### `max`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**求所有列出价格中的最大值**

```json
{
    "price_max": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "xpath_one",
                "_args": ["number(text())"]
            },
            {
                "_fn": "max"
            }
        ]
    }
}
```

```json
{
    "price_max": 456.0
}
```

### `min`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**求所有列出价格的平均值**

```json
{
    "price_min": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "xpath_one",
                "_args": ["number(text())"]
            },
            {
                "_fn": "min"
            }
        ]
    }
}
```

```json
{
    "price_min": 100.0
}
```

### `product`

#### 示例 HTML

```html
<body>
    <div class="product">
        <property class="colors">
            <option class="color">红色</option>
            <option class="color">绿色</option>
            <option class="color">蓝色</option>
        </property>
        <property class="sizes">
            <option class="size">S</option>
            <option class="size">M</option>
            <option class="size">L</option>
            <option class="size">XL</option>
        </property>
    </div>
</body>
```

**获取不同产品变体的数量**

```json
{
    "number_of_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//property"]
            },
            {
                "_fn": "xpath",
                "_args": [".//option"]
            },
            {
                "_fn": "length"
            },
            {
                "_fn": "product"
            }
        ]
    }
}
```

```json
{
    "number_of_variants": 12
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/products/cn/web-scraper-api/features/custom-parser/writing-instructions-manually/list-of-functions/function-examples.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
