# Parsing instruction examples The following HTML snippet is parsed using example parsing instructions in the upcoming sections. ### Sample HTML ```html

Shoes

223.12

Super

Pants

60.12

Amazing
Quality

Socks

123.12

Very
Nice
Socks

``` ### Bare minimum {% hint style="info" %} Use case: you want to extract the text from all **shoes** **description** **items**. {% endhint %} *Example 1. Shoes description items selection using XPath.* ```json { "shoes_description": { "_fns": [ { "_fn": "xpath", "_args": [ ".//div[@id='shoes']//li[@class='description-item']/text()" ] } ] } } ``` The `xpath` function will find a single item and put it in a list as a string: ```json { "shoes_description": [ "Super" ] } ``` The exact `xpath` function behavior is described [**here**](/products/web-scraper-api/features/custom-parser/writing-instructions-manually/list-of-functions.md). ### Nested parsing instructions {% hint style="info" %} Use case: you want to parse all information related to shoes. Also, the parsed result should represent the document structure of the provided HTML. {% endhint %} You are targeting this part of the Sample HTML: ```html

Shoes

223.12

Super

``` And you would like the parsed result to be of the following structure: ```json { "shoes": { "title": "Shoes", "price": "223.12", "description": [ "Super" ] } } ``` Parsing instructions would look as follows. *Example 2. Parsing instructions are used to parse* `shoes` *information.* ```json { "shoes": { "title": { "_fns": [ { "_fn": "xpath_one", "_args": ["//div[@id='shoes']/div[@class='title']/text()"] } ] }, "price": { "_fns": [ { "_fn": "xpath_one", "_args": ["//div[@id='shoes']/div[@class='price']/text()"] } ] }, "description": { "_fns": [ { "_fn": "xpath", "_args": ["//div[@id='shoes']//li[@class='description-item']/text()"] } ] } } } ``` `xpath_one` works similarly to `xpath`, but instead of returning a list of all matches, it **returns the first matched item**. In the example above, the `shoes` property is the only property defined in the outermost instructions scope. The `shoes` property contains nested parsing instructions. The `shoes` instructions scope does not have a pipeline defined (`_fns` property is missing). This means pipelines defined in `title`, `price`, and `description` scopes will use the document-under-parse as a pipeline input. In Example 2, you can see a repetition of `//div[@id='shoes']` in XPath expressions. The repetition can be avoided by defining a pipeline in `shoes` scope: *Example 3. Defining a pipeline in* `shoes` *scope instructions to avoid XPath expression repetition.* ```json { "shoes": { "_fns": [ { "_fn": "xpath_one", "_args": ["//div[@id='shoes']"] } ], "title": { "_fns": [ { "_fn": "xpath_one", "_args": ["./div[@class='title']/text()"] } ] }, "price": { "_fns": [ { "_fn": "xpath_one", "_args": ["./div[@class='price']/text()"] } ] }, "description": { "_fns": [ { "_fn": "xpath", "_args": [".//li[@class='description-item']/text()"] } ] } } } ``` By using the parsing instructions provided in Example 3, Custom Parser will: 1. Start with processing `shoes._fns` pipeline, which will output the `shoes` HTML element; 2. Take `shoes._fns` pipeline output and use it as an input for pipelines, defined in `title`, `price`, and `description` scopes; 3. Process `title`, `price`, and `description` pipelines to produce final values. The result will look the same as a result from Example 2: ```json { "shoes": { "title": "Shoes", "price": "223.12", "description": [ "Super" ] } } ``` The main difference between Example 2 and Example 3 is that in Example 3, pipeline is defined in the `shoes` scope. **This additional pipeline selects the element of the shoes and passes it on to further pipelines found deeper in the instructions hierarchy.** ### List of nested objects {% hint style="info" %} **Use case:** Previously, you wanted to parse only `shoes` information. Now you want to parse the information of all products in the HTML. {% endhint %} The [**Sample HTML**](/products/web-scraper-api/features/custom-parser/writing-instructions-manually/parsing-instruction-examples.md#sample-html) is used again as the document-under-parse. If you want your parsed result to look like this: ```json { "products": [ { "title": "Shoes", "price": "223.12", "description": [ "Super" ] }, { "title": "Pants", "price": "60.12", "description": [ "Amazing", "Quality" ] }, { "title": "Socks", "price": "123.12", "description": [ "Very", "Nice", "Socks" ] } ] } ``` The parsing instructions would look as follows: *Example 4. Parsing all products found on HTML document.* ```json { "products": { "_fns": [ { "_fn": "xpath", "_args": ["//div[@class='product']"] } ], "_items": { "title": { "_fns": [ { "_fn": "xpath_one", "_args": ["./div[@class='title']/text()"] } ] }, "price": { "_fns": [ { "_fn": "xpath_one", "_args": ["./div[@class='price']/text()"] } ] }, "description": { "_fns": [ { "_fn": "xpath", "_args": [".//li[@class='description-item']/text()"] } ] } } } } ``` The parsing instruction structure looks similar to the one in Example 3. However, there are two major exceptions: 1. `xpath` is used instead of `xpath_one` in `products._fns` pipeline. `products._fns` pipeline will now output a list of all elements that match the provided XPath expression (a list of product elements). 2. `_items` reserved property is used to indicate that you want to form a list by iterating through each item of the `products._fns` pipeline output and **passing/processing every list item separately** down the pipeline scope. If `_items` reserved property wasn't used in Example 4 parsing instructions, the parsed result would look as follows: ```json { "products": { "title": [ "Shoes", "Pants", "Socks" ], "price": [ "223.12", "60.12", "123.12" ], "description": [ [ "Super" ], [ "Amazing", "Quality" ], [ "Very", "Nice", "Socks" ] ] } } ``` {% hint style="warning" %} `_items` is used to specify that the Custom Parser must pass ***separate list items*** instead of the ***whole list*** down the parsing instructions. {% endhint %} ### Select the N-th element from a list This section demonstrates the flexibility of pipelines. The same problem can be approached in different ways. Multiple options can be used to select the N-th element from a list of any values. {% hint style="info" %} **Use case:** you want to select the second product price from the page. {% endhint %} The [**Sample HTML**](#sample-html) is again used as an example. You have multiple options to select the 2nd product. #### Option 1 You can utilize the XPath `[]` selector and define selection in the XPath expression. *Example 5. Select 2nd price using XPath \[] selector.* ```json { "second_price": { "_fns": [ { "_fn": "xpath", "_args": [ "(//div[@class='price'])[2]/text()" ] } ] } } ``` Result: ```json { "second_price": [ "60.12" ] } ``` #### Option 2 You can also use the `xpath` function to find all prices and pipe it to the function `select_nth`, which selects the n-th element from the extracted list of prices. *Example 6. Select the 2nd value using the \`select\_nth\` function.* ```json { "second_price": { "_fns": [ { "_fn": "xpath", "_args": [ "//div[@class='price']/text()" ] }, { "_fn": "select_nth", "_args": 1 } ] } } ``` Result: ```json { "second_price": "60.12" } ``` {% hint style="warning" %} Notice how the `select_nth` function returns an item from a list while the `xpath` function returns a list of items, even if a single item is found. {% endhint %} #### Option 3 You can use `select_nth` with any list type, including lists of HTML elements: *Example 7. Selecting all product HTML elements with* `class="product"` *==> selecting 2nd product element from the list ==> extracting price text from the selected product HTML element*. ```json { "second_price": { "_fns": [ { "_fn": "xpath", "_args": ["//div[@class='product']"] }, { "_fn": "select_nth", "_args": 1 }, { "_fn": "xpath", "_args": ["./div[@class='price']/text()"] } ] } } ``` Result: ```json { "second_price": ["60.12"] } ``` ### Error handling When given the following HTML snippet: ```html

Nice Shoes

223.12

Super

``` And trying to parse it with the following parsing instructions: ```json { "product": { "_fns": [ { "_fn": "xpath_one", "_args": ["//div[@id='shoes']"] } ], "price": { "_fns": [ { "_fn": "xpath_one", "_args": ["//div[@class='price']/text()"] } ] }, "title": { "_fns": [ { "_fn": "xpath_one", "_args": ["//div[@class='title']/text()"] } ] }, "description": { "_fns": [ { "_fn": "xpath_one", "_args": ["//div[@class='description']/text()"] }, { "_fn": "convert_to_float" } ] } } } ``` Custom Parser will return a parsed result where `price` and `title` were parsed normally, but the `description` was failed to parse due to the `convert_to_float` function failing to convert `string` to `float`: ```json { "product": { "price": "223.12", "title": "Shoes", "description": null }, "_warnings": [ { "_fn": "convert_to_float", "_fn_idx": 1, "_msg": "Failed to process function.", "_path": ".product.description" } ] } ``` By default, all errors are counted as warnings and are placed inside of the `_warnings` list. If you would like to ignore the errors when parsing a field, they can suppress warnings/errors with `"_on_error": "suppress"` parameter: ```json { "product": { ..., "description": { "_on_error": "suppress", "_fns": [ { "_fn": "xpath_one", "_args": ["//div[@class='description']/text()"] }, { "_fn": "convert_to_float" } ] } } } ``` Which will then produce the following result: ```json { "product": { "price": "223.12", "title": "Shoes", "description": null } } ``` ### Array of arrays Custom Parser allows N-dimensional arrays in parsed results. As an example, let’s use the following HTML snippet: ```html

``` Let's say you want to parse the document so that the result is a 3x3 2-dimension array of integers: ```json { "table": [ [1, 2, 3], [4, 5, 6], [7, 8, 9], ] } ``` To parse the HTML into the JSON above, you can use the following parsing instructions: ```json { "table": { "_fns": [ { "_fn": "xpath", "_args": ["//div[@class='row']"] }, { "_fn": "xpath", "_args": [".//div[@class='column']/text()"] }, { "_fn": "convert_to_int" } ] } } ``` --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://developers.oxylabs.io/products/web-scraper-api/features/custom-parser/writing-instructions-manually/parsing-instruction-examples.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.