Generating parsing instructions via API
You can generate parsing instruction sets via API by providing URLs and describing what data points you would like to parse. Upon receiving the generated parsing instructions, you may save them as a parser preset or simply send the instructions with your scraping request.
You can also generate parsing instructions via OxyCopilot on our API playground.
Generate instructions from prompt
You can generate parsing instructions by inputting a free-text description of the data points you would like to parse and giving us a few URLs that belong to the same page type. The API will respond with a set of parsing instructions.
Endpoint:
https://data.oxylabs.io/v1/parsers/generate-instructions/prompt
Method:
POST
Authentication:
Basic
Request headers:
Content-Type: application/json
Sample payload
{
"prompt_text": "Parse title of the product, main price, developer name and platform name.",
"urls": [
"https://sandbox.oxylabs.io/products/1",
"https://sandbox.oxylabs.io/products/2",
"https://sandbox.oxylabs.io/products/4"
],
"render": false
}
prompt_text
Free-text description of the data points to be parsed.
urls
List of URLs exemplifying the page type you would like to get parsing instructions for.
render
Whether or not JS rendering should be used to fetch the required content.
- mandatory parameter
Sample response
{
"parsing_instructions": {
"developer_name": {
"_fns": [
{
"_args": [
"//div[contains(@class, 'brand-wrapper')]//span[@class='brand developer']"
],
"_fn": "xpath"
},
{
"_args": [
"normalize-space(.)"
],
"_fn": "xpath"
},
{
"_args": " ",
"_fn": "join"
}
]
},
"main_price": {
"_fns": [
{
"_args": [
"//div[contains(@class, 'product-info-wrapper')]//div[contains(@class, 'price')]/text()"
],
"_fn": "xpath_one"
},
{
"_fn": "amount_from_string"
}
]
},
"title": {
"_fns": [
{
"_args": [
"//div[contains(@class, 'product-info-wrapper')]//h2/text()"
],
"_fn": "xpath_one"
},
{
"_args": [
"^\\s*(.[\\s\\S]*?)\\s*$",
1
],
"_fn": "regex_search"
}
]
}
},
"prompt_schema": {
"properties": {
"developer_name": {
"description": "Developer name.",
"title": "Developer Name",
"type": "string"
},
"main_price": {
"description": "Main price of the product.",
"title": "Main Price",
"type": "number"
},
"platform_name": {
"description": "Platform name.",
"title": "Platform Name",
"type": "string"
},
"title": {
"description": "Title of the product.",
"title": "Title",
"type": "string"
}
},
"required": [
"title",
"main_price",
"developer_name",
"platform_name"
],
"title": "Fields",
"type": "object"
}
}
Generate instructions from JSON schema
There are cases where you want to get parsed data in a specific JSON schema. You can use this endpoint to get parsing instructions that stricly adhere to the schema you provide.
Endpoint:
https://data.oxylabs.io/v1/parsers/generate-instructions/schema
Method:
POST
Authentication:
Basic
Request headers:
Content-Type: application/json
Sample payload
{
"urls": [
"https://oxylabs.io",
"https://example.com",
"https://bbc.co.uk"
],
"prompt_schema": {
"properties": {
"links": {
"description": "An array of URL strings",
"type": "array",
"items": {
"type": "string",
"description": "A URL"
}
}
},
"required": [
"links"
]
},
"render": false
}
prompt_schema
JSON schema describing the required parser output.
urls
List of URLs exemplifying the page type you would like to get parsing instructions for.
render
Whether or not JS rendering should be used to fetch the required content.
- mandatory parameter
Sample response
{
"parsing_instructions": {
"links": {
"_fns": [
{
"_args": [
"//a[@href and normalize-space(@href) != '']/@href"
],
"_fn": "xpath"
}
]
}
}
}
Last updated
Was this helpful?