# Parsing instruction examples

The following HTML snippet is parsed using example parsing instructions in the upcoming sections.

### Sample HTML <a href="#sample-html" id="sample-html"></a>

```html
<body>
    <div id="products">
        <div class="product" id="shoes">
            <div class="title">Shoes</div>
            <div class="price">223.12</div>
            <div class="description">
                <ul>
                    <li class="description-item">Super</li>
                </ul>
            </div>
        </div>
        <div class="product" id="pants">
            <div class="title">Pants</div>
            <div class="price">60.12</div>
            <div class="description">
                <ul>
                    <li class="description-item">Amazing</li>
                    <li class="description-item">Quality</li>
                </ul>
            </div>
        </div>
        <div class="product" id="socks">
            <div class="title">Socks</div>
            <div class="price">123.12</div>
            <div class="description">
                <ul>
                    <li class="description-item">Very</li>
                    <li class="description-item">Nice</li>
                    <li class="description-item">Socks</li>
                </ul>
            </div>
        </div>
    </div>
</body>
```

### Bare minimum <a href="#bare-minimum" id="bare-minimum"></a>

{% hint style="info" %}
Use case: you want to extract the text from all **shoes** **description** **items**.
{% endhint %}

*Example 1. Shoes description items selection using XPath.*

```json
{
    "shoes_description": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [
                    ".//div[@id='shoes']//li[@class='description-item']/text()"
                ]
            }
        ]
    }
}
```

The `xpath` function will find a single item and put it in a list as a string:

```json
{
    "shoes_description": [
        "Super"
    ]
}
```

The exact `xpath` function behavior is described [**here**](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/custom-parser/writing-instructions-manually/list-of-functions).

### Nested parsing instructions <a href="#nested-parsing-instructions" id="nested-parsing-instructions"></a>

{% hint style="info" %}
Use case: you want to parse all information related to shoes. Also, the parsed result should represent the document structure of the provided HTML.
{% endhint %}

You are targeting this part of the Sample HTML:

```html
<div class="product" id="shoes">
    <div class="title">Shoes</div>
    <div class="price">223.12</div>
    <div class="description">
        <ul>
            <li class="description-item">Super</li>
        </ul>
    </div>
</div>
```

And you would like the parsed result to be of the following structure:

```json
{
    "shoes": {
        "title": "Shoes",
        "price": "223.12",
        "description": [
            "Super"
        ]
    }
}
```

Parsing instructions would look as follows.

*Example 2. Parsing instructions are used to parse* `shoes` *information.*

```json
{
    "shoes": {
        "title": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": ["//div[@id='shoes']/div[@class='title']/text()"]
                }
            ]
        },
        "price": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": ["//div[@id='shoes']/div[@class='price']/text()"]
                }
            ]
        },
        "description": {
            "_fns": [
                {
                    "_fn": "xpath",
                    "_args": ["//div[@id='shoes']//li[@class='description-item']/text()"]
                }
            ]
        }
    }
}
```

`xpath_one` works similarly to `xpath`, but instead of returning a list of all matches, it **returns the first matched item**.

In the example above, the `shoes` property is the only property defined in the outermost instructions scope. The `shoes` property contains nested parsing instructions.

The `shoes` instructions scope does not have a pipeline defined (`_fns` property is missing). This means pipelines defined in `title`, `price`, and `description` scopes will use the document-under-parse as a pipeline input.

In Example 2, you can see a repetition of `//div[@id='shoes']` in XPath expressions. The repetition can be avoided by defining a pipeline in `shoes` scope:

*Example 3. Defining a pipeline in* `shoes` *scope instructions to avoid XPath expression repetition.*

```json
{
    "shoes": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": ["//div[@id='shoes']"]
            }
        ],
        "title": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": ["./div[@class='title']/text()"]
                }
            ]
        },
        "price": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": ["./div[@class='price']/text()"]
                }
            ]
        },
        "description": {
            "_fns": [
                {
                    "_fn": "xpath",
                    "_args": [".//li[@class='description-item']/text()"]
                }
            ]
        }
    }
}
```

By using the parsing instructions provided in Example 3, Custom Parser will:

1. Start with processing `shoes._fns` pipeline, which will output the `shoes` HTML element;
2. Take `shoes._fns` pipeline output and use it as an input for pipelines, defined in `title`, `price`, and `description` scopes;
3. Process `title`, `price`, and `description` pipelines to produce final values.

The result will look the same as a result from Example 2:

```json
{
    "shoes": {
        "title": "Shoes",
        "price": "223.12",
        "description": [
            "Super"
        ]
    }
}
```

The main difference between Example 2 and Example 3 is that in Example 3, pipeline is defined in the `shoes` scope. **This additional pipeline selects the element of the shoes and passes it on to further pipelines found deeper in the instructions hierarchy.**

### List of nested objects <a href="#list-of-nested-objects" id="list-of-nested-objects"></a>

{% hint style="info" %}
**Use case:** Previously, you wanted to parse only `shoes` information. Now you want to parse the information of all products in the HTML.
{% endhint %}

The [**Sample HTML**](#sample-html) is used again as the document-under-parse.

If you want your parsed result to look like this:

```json
{
    "products": [
        {
            "title": "Shoes",
            "price": "223.12",
            "description": [
                "Super"
            ]
        },
        {
            "title": "Pants",
            "price": "60.12",
            "description": [
                "Amazing",
                "Quality"
            ]
        },
        {
            "title": "Socks",
            "price": "123.12",
            "description": [
                "Very",
                "Nice",
                "Socks"
            ]
        }
    ]
}
```

The parsing instructions would look as follows:

*Example 4. Parsing all products found on HTML document.*

```json
{
    "products": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//div[@class='product']"]
            }
        ],
        "_items": {
            "title": {
                "_fns": [
                    {
                        "_fn": "xpath_one",
                        "_args": ["./div[@class='title']/text()"]
                    }
                ]
            },
            "price": {
                "_fns": [
                    {
                        "_fn": "xpath_one",
                        "_args": ["./div[@class='price']/text()"]
                    }
                ]
            },
            "description": {
                "_fns": [
                    {
                        "_fn": "xpath",
                        "_args": [".//li[@class='description-item']/text()"]
                    }
                ]
            }
        }
    }
}
```

The parsing instruction structure looks similar to the one in Example 3. However, there are two major exceptions:

1. `xpath` is used instead of `xpath_one` in `products._fns` pipeline. `products._fns` pipeline will now output a list of all elements that match the provided XPath expression (a list of product elements).
2. `_items` reserved property is used to indicate that you want to form a list by iterating through each item of the `products._fns` pipeline output and **passing/processing every list item separately** down the pipeline scope.

If `_items` reserved property wasn't used in Example 4 parsing instructions, the parsed result would look as follows:

```json
{
    "products": {
        "title": [
            "Shoes",
            "Pants",
            "Socks"
        ],
        "price": [
            "223.12",
            "60.12",
            "123.12"
        ],
        "description": [
            [
                "Super"
            ],
            [
                "Amazing",
                "Quality"
            ],
            [
                "Very",
                "Nice",
                "Socks"
            ]
        ]
    }
}
```

{% hint style="warning" %}
`_items` is used to specify that the Custom Parser must pass ***separate list items*** instead of the ***whole list*** down the parsing instructions.
{% endhint %}

### Select the N-th element from a list  <a href="#select-n-th-element-from-a-list" id="select-n-th-element-from-a-list"></a>

This section demonstrates the flexibility of pipelines. The same problem can be approached in different ways.

Multiple options can be used to select the N-th element from a list of any values.

{% hint style="info" %}
**Use case:** you want to select the second product price from the page.
{% endhint %}

The [**Sample HTML**](#sample-html) is again used as an example. You have multiple options to select the 2nd product.

#### Option 1 <a href="#option-1" id="option-1"></a>

You can utilize the XPath `[]` selector and define selection in the XPath expression.

*Example 5. Select 2nd price using XPath \[] selector.*

```json
{
    "second_price": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [
                    "(//div[@class='price'])[2]/text()"
                ]
            }
        ]
    }
}
```

Result:

```json
{
    "second_price": [
        "60.12"
    ]
}
```

#### Option 2 <a href="#option-2" id="option-2"></a>

You can also use the `xpath` function to find all prices and pipe it to the function `select_nth`, which selects the n-th element from the extracted list of prices.

*Example 6. Select the 2nd value using the \`select\_nth\` function.*

```json
{
    "second_price": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [
                    "//div[@class='price']/text()"
                ]
            },
            {
                "_fn": "select_nth",
                "_args": 1
            }
        ]
    }
}
```

Result:

```json
{
    "second_price": "60.12"
}
```

{% hint style="warning" %}
Notice how the `select_nth` function returns an item from a list while the `xpath` function returns a list of items, even if a single item is found.
{% endhint %}

#### Option 3  <a href="#option-3" id="option-3"></a>

You can use `select_nth` with any list type, including lists of HTML elements:

*Example 7. Selecting all product HTML elements with* `class="product"` *==> selecting 2nd product element from the list ==> extracting price text from the selected product HTML element*.

```json
{
    "second_price": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//div[@class='product']"]
            },
            {
                "_fn": "select_nth",
                "_args": 1
            },
            {
                "_fn": "xpath",
                "_args": ["./div[@class='price']/text()"]
            }
        ]
    }
}
```

Result:

```json
{
    "second_price": ["60.12"]
}
```

### Error handling <a href="#error-handling" id="error-handling"></a>

When given the following HTML snippet:

```html
<div class="product" id="shoes">
    <div class="title">Nice Shoes</div>
    <div class="price">223.12</div>
    <div class="description">Super</div>
</div>
```

And trying to parse it with the following parsing instructions:

```json
{
    "product": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": ["//div[@id='shoes']"]
            }
        ],
        "price": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": ["//div[@class='price']/text()"]
                }
            ]
        },
        "title": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": ["//div[@class='title']/text()"]
                }
            ]
        },
        "description": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": ["//div[@class='description']/text()"]
                },
                {
                    "_fn": "convert_to_float"
                }
            ]
        }
    }
}
```

Custom Parser will return a parsed result where `price` and `title` were parsed normally, but the `description` was failed to parse due to the `convert_to_float` function failing to convert `string` to `float`:

```json
{
    "product": {
        "price": "223.12",
        "title": "Shoes",
        "description": null
    },
    "_warnings": [
        {
            "_fn": "convert_to_float",
            "_fn_idx": 1,
            "_msg": "Failed to process function.",
            "_path": ".product.description"
        }
    ]
}
```

By default, all errors are counted as warnings and are placed inside of the `_warnings` list. If you would like to ignore the errors when parsing a field, they can suppress warnings/errors with `"_on_error": "suppress"` parameter:

```json
{
    "product": {
        ...,
        "description": {
            "_on_error": "suppress",
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": ["//div[@class='description']/text()"]
                },
                {
                    "_fn": "convert_to_float"
                }
            ]
        }
    }
}
```

Which will then produce the following result:

```json
{
    "product": {
        "price": "223.12",
        "title": "Shoes",
        "description": null
    }
}
```

### Array of arrays <a href="#array-of-arrays" id="array-of-arrays"></a>

Custom Parser allows N-dimensional arrays in parsed results. As an example, let’s use the following HTML snippet:

```html
<div class="row">
    <div class="column">1</div>
    <div class="column">2</div>
    <div class="column">3</div>
</div>
<div class="row">
    <div class="column">4</div>
    <div class="column">5</div>
    <div class="column">6</div>
</div>
<div class="row">
    <div class="column">7</div>
    <div class="column">8</div>
    <div class="column">9</div>
</div>
```

Let's say you want to parse the document so that the result is a 3x3 2-dimension array of integers:

```json
{
    "table": [
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9],
    ]
}
```

To parse the HTML into the JSON above, you can use the following parsing instructions:

```json
{
    "table": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": ["//div[@class='row']"]
            },
            {
                "_fn": "xpath",
                "_args": [".//div[@class='column']/text()"]
            },
            {
                "_fn": "convert_to_int"
            }
        ]
    }
}
```
