解析函数示例
自定义解析器函数的实用示例,涵盖 HTML 处理、字符串操作、数学运算和常见解析任务。
HTML 处理
element_text
element_text示例 HTML
<!DOCTYPE html>
<html>
<body>
<div id="product">
<div id="product-description">这是一个不错的产品</div>
<div id="product-price"> 12 3
</div>
</div>
</body>
</html>从 HTML 元素提取文本并去除空白
{
"price": {
"_fns": [
{
"_fn": "xpath_one",
"_args": [".//*[@id='product-price']"]
},
{
"_fn": "element_text"
}
]
}
}{
"price": "12 3"
}给定一个字符串值作为输入,不做任何处理
{
"price": {
"_fns": [
{
"_fn": "xpath_one",
"_args": [".//*[@id='product-price']/text()"]
},
{
"_fn": "element_text"
}
]
}
}{
"price": " 12 3\n\n\n "
}xpath
xpath示例 HTML
<body>
<div class="product" id="socks">
<div class="title">袜子</div>
<div class="price">123.12</div>
<div class="description">
<ul>
<li class="description-item">非常</li>
<li class="description-item">好</li>
<li class="description-item">袜子</li>
</ul>
</div>
</div>
</body>获取所有描述项
{
"description_items": {
"_fns": [
{
"_fn": "xpath",
"_args": ["//li[@class='description-item']/text()"]
}
]
}
}{
"description_items": ["Very", "Nice", "Socks"]
}获取第一个描述项
{
"first_description_item": {
"_fns": [
{
"_fn": "xpath",
"_args": ["(//li[@class='description-item'])[1]/text()"]
}
]
}
}{
"first_description_item": [
"Very"
]
}检查描述部分元素是否存在
{
"description_section_exists": {
"_fns": [
{
"_fn": "xpath",
"_args": ["boolean(//div[@class='description'])"]
}
]
}
}{
"description_section_exists": true
}将价格作为数字获取
{
"price": {
"_fns": [
{
"_fn": "xpath",
"_args": ["number(//div[@class='price'])"]
}
]
}
}{
"description_section_exists": 123.12
}当前一个表达式失败时使用多个表达式回退
{
"price": {
"_fns": [
{
"_fn": "xpath",
"_args": [
"//div[@class='product-price']/text()", <--- this does not find anything
"//div[@class='price']/text()" <--- this finds the target price
]
}
]
}
}{
"price": [
"123.12"
]
}XPath | 用于与多个表达式匹配的运算符
{
"price_and_title": {
"_fns": [
{
"_fn": "xpath",
"_args": ["//div[@class='price']/text() | //div[@class='title']/text()"]
}
]
}
}{
"price_and_title": [
"Socks",
"123.12"
]
}xpath_one
xpath_one示例 HTML
<body>
<div class="product" id="socks">
<div class="title">袜子</div>
<div class="price">123.12</div>
<div class="description">
<ul>
<li class="description-item">非常</li>
<li class="description-item">好</li>
<li class="description-item">袜子</li>
</ul>
</div>
</div>
</body>返回第一个匹配项
{
"first_description_item": {
"_fns": [
{
"_fn": "xpath_one",
"_args": [".//li/text()"]
}
]
}
}{
"first_description_item": "Very"
}使用 XSLT 函数
{
"price": {
"_fns": [
{
"_fn": "xpath_one",
"_args": ["number(.//div[@class='price'])"]
}
]
}
}{
"price": 123.12
}字符串操作
amount_from_string
amount_from_string示例 HTML
<body>
<div class="product" id="socks">
<div class="title">袜子</div>
<div class="price">The price is: 123.12 pesos</div>
</div>
</body>从字符串中提取金额
{
"price": {
"_fns": [
{
"_fn": "xpath_one",
"_args": [".//div[@class='price']/text()"]
},
{
"_fn": "amount_from_string"
}
]
}
}{
"price": 123.12
}amount_range_from_string
amount_range_from_string示例 HTML
<body>
<div class="product">
<div class="price">
The price is: 123.12 pesos;
The price is: 345.12 pesos;
The price is: 678.12 pesos
</div>
</div>
</body>从字符串中提取所有金额
{
"prices": {
"_fns": [
{
"_fn": "xpath_one",
"_args": [".//div[@class='price']/text()"]
},
{
"_fn": "amount_range_from_string"
}
]
}
}{
"prices": [
123.12,
345.12,
678.12
]
}join
join示例 HTML
<body>
<div class="product">
<div class="price">
The price is: 123.12 pesos;
</div>
<div class="price">
The price is: 345.12 pesos;
</div>
<div class="price">
The price is: 678.12 pesos
</div>
</div>
</body>将字符串数组连接成单个字符串
{
"price_variants": {
"_fns": [
{
"_fn": "xpath",
"_args": [".//div[@class='price']"]
},
{ // 如果我们在第一个管道函数中调用 normalize-space(),
// 它将只返回第一个值。
"_fn": "xpath",
"_args": ["normalize-space(text())"]
},
{
"_fn": "join",
"_args": ""
}
]
}
}{
"price_variants": "The price is: 123.12 pesos;The price is: 345.12 pesos;The price is: 678.12 pesos"
}regex_find_all
regex_find_all示例 HTML
<body>
<div class="product">
<div class="description">
[one description]
[two description]
[three description]
</div>
</div>
</body>查找两个字符之间的所有匹配项
{
"descriptions": {
"_fns": [
{
"_fn": "xpath_one",
"_args": [".//div[@class='description']/text()"]
},
{
"_fn": "regex_find_all",
"_args": ["\\[(.*)\\]"]
}
]
}
}{
"descriptions": [
"one description",
"two description",
"three description"
]
}regex_search
regex_search示例 HTML
<body>
<div class="product">
<div class="description">
[one description]
[two description]
[three description]
{the one i need}
</div>
</div>
</body>返回两个字符之间的描述
{
"description": {
"_fns": [
{
"_fn": "xpath_one",
"_args": [".//div[@class='description']/text()"]
},
{
"_fn": "regex_search",
"_args": ["{(.*)}", 1]
}
]
}
}{
"description": "the one i need"
}regex_substring
regex_substring示例 HTML
<body>
<div class="product">
<div class="description">
* one description
* two description
* three description
* {this one i would like to get replaced}
</div>
</div>
</body>用指定值替换部分文本
{
"descriptions": {
"_fns": [
{
"_fn": "xpath_one",
"_args": [".//div[@class='description']/text()"]
},
{
"_fn": "regex_substring",
"_args": ["{this one i would like to get replaced}", "four description"]
},
{
"_fn": "regex_find_all",
"_args": ["\\*\\s(.*)\n"]
}
]
}
}{
"descriptions": [
"one description",
"two description",
"three description",
"four description"
]
}常用函数
convert_to_*
convert_to_*示例 HTML
<body>
<div class="product">
<div class="price">123</div>
<div class="price">124</div>
<div class="price">456</div>
<div class="price">421</div>
<div class="price">100</div>
</div>
</body>获取价格变体的数量
{
"price_variants": {
"_fns": [
{
"_fn": "xpath",
"_args": [".//div[@class='price']"]
},
{
"_fn": "length"
}
]
}
}{
"price_variants": 5
}获取多维数组中价格变体的数量
示例 HTML:
<body>
<div class="product">
<property class="colors">
<option class="color">Red</option>
<option class="color">Green</option>
<option class="color">Blue</option>
</property>
<property class="sizes">
<option class="size">S</option>
<option class="size">M</option>
<option class="size">L</option>
<option class="size">XL</option>
</property>
</div>
</body>{
"number_of_variants": {
"_fns": [
{
"_fn": "xpath",
"_args": [".//property"]
},
{
"_fn": "xpath",
"_args": [".//option"]
},
{
"_fn": "length"
}
]
}
}{
"number_of_variants": [
3,
3
]
}select_nth
select_nth示例 HTML
<body>
<div class="product" id="socks">
<div class="title">袜子</div>
<div class="price">123.12</div>
<div class="description">
<ul>
<li class="description-item">非常</li>
<li class="description-item">好</li>
<li class="description-item">袜子</li>
</ul>
</div>
</div>
</body>从数组中选择第一个描述项
{
"price_and_title": {
"_fns": [
{
"_fn": "xpath",
"_args": ["//li[@class='description-item']/text()"]
},
{
"_fn": "select_nth",
"_args": 0
}
]
}
}{
"price_and_title": "Very"
}从数组中选择最后一个描述项
{
"price_and_title": {
"_fns": [
{
"_fn": "xpath",
"_args": ["//li[@class='description-item']/text()"]
},
{
"_fn": "select_nth",
"_args": -1
}
]
}
}{
"price_and_title": "Socks"
}数学函数
average
average示例 HTML
<body>
<div class="product">
<div class="price">123</div>
<div class="price">124</div>
<div class="price">456</div>
<div class="price">421</div>
<div class="price">100</div>
</div>
</body>求所有列出价格的平均值
{
"price_average": {
"_fns": [
{
"_fn": "xpath",
"_args": [".//div[@class='price']"]
},
{
"_fn": "xpath_one",
"_args": ["number(text())"]
},
{
"_fn": "average"
}
]
}
}{
"price_average": 244.8
}max
max示例 HTML
<body>
<div class="product">
<div class="price">123</div>
<div class="price">124</div>
<div class="price">456</div>
<div class="price">421</div>
<div class="price">100</div>
</div>
</body>求所有列出价格的最大值
{
"price_max": {
"_fns": [
{
"_fn": "xpath",
"_args": [".//div[@class='price']"]
},
{
"_fn": "xpath_one",
"_args": ["number(text())"]
},
{
"_fn": "max"
}
]
}
}{
"price_max": 456.0
}min
min示例 HTML
<body>
<div class="product">
<div class="price">123</div>
<div class="price">124</div>
<div class="price">456</div>
<div class="price">421</div>
<div class="price">100</div>
</div>
</body>求所有列出价格的平均值
{
"price_min": {
"_fns": [
{
"_fn": "xpath",
"_args": [".//div[@class='price']"]
},
{
"_fn": "xpath_one",
"_args": ["number(text())"]
},
{
"_fn": "min"
}
]
}
}{
"price_min": 100.0
}product
product示例 HTML
<body>
<div class="product">
<property class="colors">
<option class="color">Red</option>
<option class="color">Green</option>
<option class="color">Blue</option>
</property>
<property class="sizes">
<option class="size">S</option>
<option class="size">M</option>
<option class="size">L</option>
<option class="size">XL</option>
</property>
</div>
</body>获取不同产品变体的数量
{
"number_of_variants": {
"_fns": [
{
"_fn": "xpath",
"_args": [".//property"]
},
{
"_fn": "xpath",
"_args": [".//option"]
},
{
"_fn": "length"
},
{
"_fn": "product"
}
]
}
}{
"number_of_variants": 12
}最后更新于
这有帮助吗?

