# 解析函数示例

## HTML 处理

### `element_text`

#### 示例 HTML

```html
<!DOCTYPE html>
<html>
<body>
    <div id="product">
        <div id="product-description">这是一个很好的产品</div>
        <div id="product-price">    12  3


        </div>
    </div>
</body>
</html>
```

**从 HTML 元素提取文本并去除空白**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//*[@id='product-price']"]
            },
            {
                "_fn": "element_text"
            }
        ]
    }
}
```

```json
{
    "price": "12  3"
}
```

**给定一个字符串值作为输入，不做任何处理**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//*[@id='product-price']/text()"]
            },
            {
                "_fn": "element_text"
            }
        ]
    }
}
```

```json
{
    "price": "    12  3\n\n\n        "
}
```

### `xpath`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">Socks</div>
        <div class="price">123.12</div>
        <div class="description">
            <ul>
                <li class="description-item">Very</li>
                <li class="description-item">Nice</li>
                <li class="description-item">Socks</li>
            </ul>
        </div>
    </div>
</body>
```

**获取所有描述项**

```json
{
    "description_items": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//li[@class='description-item']/text()"]
            }
        ]
    }
}
```

```json
{
    "description_items": ["Very", "Nice", "Socks"]
}
```

**获取第一个描述项**

```json
{
    "first_description_item": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["(//li[@class='description-item'])[1]/text()"]
            }
        ]
    }
}
```

```json
{
    "first_description_item": [
        "Very"
    ]
}
```

**检查描述区元素是否存在**

```json
{
    "description_section_exists": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["boolean(//div[@class='description'])"]
            }
        ]
    }
}
```

```json
{
    "description_section_exists": true
}
```

**将价格作为数字获取**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["number(//div[@class='price'])"]
            }
        ]
    }
}
```

```json
{
    "description_section_exists": 123.12
}
```

**多个表达式作为回退，以防前面的表达式失败**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [
                    "//div[@class='product-price']/text()", <--- this does not find anything
                    "//div[@class='price']/text()" <--- this finds the target price
                ]
            }
        ]
    }
}
```

```json
{
    "price": [
        "123.12"
    ]
}
```

**XPath `|` 用于匹配多个表达式的运算符**

```json
{
    "price_and_title": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//div[@class='price']/text() | //div[@class='title']/text()"]
            }
        ]
    }
}
```

```json
{
    "price_and_title": [
        "Socks",
        "123.12"
    ]
}
```

### `xpath_one`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">Socks</div>
        <div class="price">123.12</div>
        <div class="description">
            <ul>
                <li class="description-item">Very</li>
                <li class="description-item">Nice</li>
                <li class="description-item">Socks</li>
            </ul>
        </div>
    </div>
</body>
```

**返回第一个匹配项**

```json
{
    "first_description_item": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//li/text()"]
            }
        ]
    }
}
```

```json
{
    "first_description_item": "Very"
}
```

**使用 XSLT 函数**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": ["number(.//div[@class='price'])"]
            }
        ]
    }
}
```

```json
{
    "price": 123.12
}
```

## 字符串操作

### `amount_from_string`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">Socks</div>
        <div class="price">The price is: 123.12 pesos</div>
    </div>
</body>
```

**从字符串中提取金额**

```json
{
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='price']/text()"]
            },
            {
                "_fn": "amount_from_string"
            }
        ]
    }
}
```

```json
{
    "price": 123.12
}
```

### `amount_range_from_string`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">
            The price is: 123.12 pesos;
            The price is: 345.12 pesos;
            The price is: 678.12 pesos
        </div>
    </div>
</body>
```

**从字符串中提取所有金额**

```json
{
    "prices": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='price']/text()"]
            },
            {
                "_fn": "amount_range_from_string"
            }
        ]
    }
}
```

```json
{    
    "prices": [
        123.12,
        345.12,
        678.12
    ]
}
```

### `join`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">
            The price is: 123.12 pesos;
        </div>
        <div class="price">
            The price is: 345.12 pesos;
        </div>
        <div class="price">
            The price is: 678.12 pesos
        </div>
    </div>
</body>
```

**将字符串数组连接为单个字符串**

```json
{
    "price_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {  // If we call normalize-space() in first pipeline function, 
               // it will return only the first value.
                "_fn": "xpath",
                "_args": ["normalize-space(text())"]
            },  
            {
                "_fn": "join",
                "_args": ""
            }
        ]
    }
}
```

```json
{
    "price_variants": "The price is: 123.12 pesos;The price is: 345.12 pesos;The price is: 678.12 pesos"
}
```

### `regex_find_all` <a href="#regex_find_all" id="regex_find_all"></a>

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="description">
            [one description]
            [two description]
            [three description]
        </div>
    </div>
</body>
```

**查找两个字符之间的所有匹配项**

```json
{
    "descriptions": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='description']/text()"]
            },
            {
                "_fn": "regex_find_all",
                "_args": ["\\[(.*)\\]"]
            }
        ]
    }
}
```

```json
{
    "descriptions": [
        "one description",
        "two description",
        "three description"
    ]
}
```

### `regex_search` <a href="#regex_search" id="regex_search"></a>

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="description">
            [one description]
            [two description]
            [three description]
            {the one i need}
        </div>
    </div>
</body>
```

**返回两个字符之间的描述**

```json
{
    "description": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='description']/text()"]
            },
            {
                "_fn": "regex_search",
                "_args": ["{(.*)}", 1]
            }
        ]
    }
}
```

```json
{
    "description": "the one i need"
}
```

### `regex_substring`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="description">
            * one description
            * two description
            * three description
            * {this one i would like to get replaced}
        </div>
    </div>
</body>
```

**用指定值替换文本的一部分**

```json
{
    "descriptions": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [".//div[@class='description']/text()"]
            },
            {
                "_fn": "regex_substring",
                "_args": ["{this one i would like to get replaced}", "four description"]
            },
            {
                "_fn": "regex_find_all",
                "_args": ["\\*\\s(.*)\n"]
            }
        ]
    }
}
```

```json
{
    "descriptions": [
        "one description",
        "two description",
        "three description",
        "four description"
    ]
}
```

## 常用函数

### `convert_to_*`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**获取价格变体的数量**

```json
{
    "price_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "length"
            }
        ]
    }
}
```

```json
{
    "price_variants": 5
}
```

**获取多维数组中价格变体的数量**

示例 HTML：

```html
<body>
    <div class="product">
        <property class="colors">
            <option class="color">Red</option>
            <option class="color">Green</option>
            <option class="color">Blue</option>
        </property>
        <property class="sizes">
            <option class="size">S</option>
            <option class="size">M</option>
            <option class="size">L</option>
            <option class="size">XL</option>
        </property>
    </div>
</body>
```

```json
{
    "number_of_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//property"]
            },
            {
                "_fn": "xpath",
                "_args": [".//option"]
            },
            {
                "_fn": "length"
            }
        ]
    }
}
```

```json
{
    "number_of_variants": [
        3,
        3
    ]
}
```

### `select_nth`

#### 示例 HTML

```html
<body>
    <div class="product" id="socks">
        <div class="title">Socks</div>
        <div class="price">123.12</div>
        <div class="description">
            <ul>
                <li class="description-item">Very</li>
                <li class="description-item">Nice</li>
                <li class="description-item">Socks</li>
            </ul>
        </div>
    </div>
</body>
```

**从数组中选择第一个描述项**

```json
{
    "price_and_title": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//li[@class='description-item']/text()"]
            },
            {
                "_fn": "select_nth",
                "_args": 0
            }
        ]
    }
}
```

```json
{
    "price_and_title": "Very"
}
```

**从数组中选择最后一个描述项**

```json
{
    "price_and_title": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//li[@class='description-item']/text()"]
            },
            {
                "_fn": "select_nth",
                "_args": -1
            }
        ]
    }
}
```

```json
{
    "price_and_title": "Socks"
}
```

## 数学函数

### `average`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**计算所有列出价格的平均值**

```json
{
    "price_average": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "xpath_one",
                "_args": ["number(text())"]
            },
            {
                "_fn": "average"
            }
        ]
    }
}
```

```json
{
    "price_average": 244.8
}
```

### `max`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**查找所有列出价格的最大值**

```json
{
    "price_max": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "xpath_one",
                "_args": ["number(text())"]
            },
            {
                "_fn": "max"
            }
        ]
    }
}
```

```json
{
    "price_max": 456.0
}
```

### `min`

#### 示例 HTML

```html
<body>
    <div class="product">
        <div class="price">123</div>
        <div class="price">124</div>
        <div class="price">456</div>
        <div class="price">421</div>
        <div class="price">100</div>
    </div>
</body>
```

**计算所有列出价格的平均值**

```json
{
    "price_min": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//div[@class='price']"]
            },
            {
                "_fn": "xpath_one",
                "_args": ["number(text())"]
            },
            {
                "_fn": "min"
            }
        ]
    }
}
```

```json
{
    "price_min": 100.0
}
```

### `product`

#### 示例 HTML

```html
<body>
    <div class="product">
        <property class="colors">
            <option class="color">Red</option>
            <option class="color">Green</option>
            <option class="color">Blue</option>
        </property>
        <property class="sizes">
            <option class="size">S</option>
            <option class="size">M</option>
            <option class="size">L</option>
            <option class="size">XL</option>
        </property>
    </div>
</body>
```

**获取不同产品变体的数量**

```json
{
    "number_of_variants": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [".//property"]
            },
            {
                "_fn": "xpath",
                "_args": [".//option"]
            },
            {
                "_fn": "length"
            },
            {
                "_fn": "product"
            }
        ]
    }
}
```

```json
{
    "number_of_variants": 12
}
```
