OxyCopilot (Beta)

OxyCopilot is a free Web Scraper API feature that makes onboarding easier and helps users find effective solutions for complex use cases, all without the need for coding knowledge. OxyCopilot currently includes two separate features:

OxyCopilot is accessible through the Scraper API Playground in the Oxylabs dashboard.

Scraper builder

OxyCopilot helps users build a scraper (request code) for the Web Scraper API without needing to understand the documentation or field logic.

How it works

Step 1: Provide a URL and prompt

  • URL: Provide the URL that you want to scrape.

  • Prompt: Describe your requirements (e.g., localization, JS rendering, etc.).

Step 2: Parsing

You have three options for handling parsing:

  1. Custom Parser: Select "Add parsing instructions" to create your own parsing logic using the Custom Parser builder.

  2. Dedicated Parser: If the URL is from a website for which we provide a dedicated parser and you want to use it, select "Continue with Dedicated Parser".

  3. No Parsing: Choose to proceed without parsing if structured data isn’t needed.

If the URL belongs to a website for which we have a dedicated parser, but you don’t need structured data, select "Continue with Dedicated Parser" and disable the parse parameter in the playground's settings. Avoid using the exit button, as it won’t save the pre-filled parameters.

Step 3: Review the request

Based on your prompt, OxyCopilot will prefill the necessary parameters in the Playground. You will see the specific request code and parameters for your use case, and you can adjust the parameters if needed.

Step 4: Submit the request and copy

If everything looks good, submit the request to see how the output looks and check if it works as expected. Then, copy the request code to use for your further scraping tasks with the Web Scraper API.

Example

URL

https://www.amazon.de/s?k=adidas

Prompt

Scrape the Amazon search page from the provided URL and localize the results to Poland.

AI-generated parameters (JSON)

{
        "source": "amazon_search",
        "query": "adidas",
        "geo_location": "PL",
        "domain": "de"
}

AI-generated request codes

Custom Parser builder

Leverage the Custom Parser feature with OxyCopilot to build a parser without needing to write code or analyze website structures manually.

How It Works

Step 1: Provide URL(s) and prompt

  • URL(s): You can provide up to 3 URLs for which you want to generate parsing instructions. OxyCopilot uses the HTML of the provided URLs to determine the best logic for extracting the required fields.

The more URLs you provide, the more robust the parsing instructions will be, as OxyCopilot identifies common patterns across similar pages. Note that additional URLs may increase the wait time for results.

  • Prompt: The prompt is the key component in building a natural language schema, which serves as the basis for generating the actual parsing instructions. The prompt should clearly describe the fields that need to be parsed.

Step 2 [Optional]: Adjust parsing schema

This step allows you to fine-tune the parsing schema to better meet your needs or troubleshoot any issues.

Parsing schema overview

This table visualizes the input used by the AI to generate parsing instructions. The schema defines which fields need to be parsed and consists of various object types (explained in the table below).

Each item in the schema must have:

  • Name: This will be used as the object key in the parsing instructions and visible in the parsed data.

  • Description (optional but recommended): Helps improve parsing accuracy.

Schema adjustments

  • Reorder items: Drag and drop items using the dots on the left side to change their order (only items within the same nesting level can be moved).

  • Edit items: Click the edit icon to modify any field.

  • Delete items: You can delete any item at the parent level.

  • Add new items: Add new items to the parent level.

Once you update the schema, click the "Refresh output" button to regenerate the instructions and preview the parsed data.

Object type explanations

Working with array of objects

  1. Select "Array of objects": This option adds a child object and button.

  1. Fill out object names: To save the item to the schema, you must fill out the names of both the parent and child objects. Once done, the checkmark will turn green.

  1. Child object requirement: An "Array of objects" must have at least one child.

Testing the instructions

By default, parsed data is based on the first URL provided in Step 1. You can also provide a different URL to test the parsing instructions:

Instructions are generated based on the initial URLs and do not account for the test URLs. Editing the prompt or URLs will reset the schema, requiring a full regeneration.

Step 3: Copy/Save instructions and integrate into scraping jobs

Once the instructions are satisfactory:

  • Use the "Copy" button to copy the instructions and paste them into your scraper code.

  • Alternatively, save the instructions to your API Playground session, adjust other request parameters, test, and then copy the complete request code in your preferred programming language.

Example

URL

https://sandbox.oxylabs.io/products/1

Prompt

I want to parse a product page. The parsed data should include the following fields:

- product_title: a text field containing the product title
- price: a numeric field containing the product price
- related_products: a list containing the titles of related products displayed below the main product information

Parsing schema

Parsing instructions

{
    "product_title": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [
                    "//h2[@class=\"title css-1k75zwy e1pl6npa11\"]/text()",
                    "//div[@class=\"product-info-wrapper css-m2w3q2 emlf3670\"]/h2/text()",
                    "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[1]/div[2]/div[2]/h2/text()"
                ]
            },
            {
                "_fn": "regex_search",
                "_args": [
                    "^\\s*(.[\\s\\S]*?)\\s*$",
                    1
                ]
            }
        ]
    },
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [
                    "//div[@class=\"price css-o7uf8d e1pl6npa6\"]/text()",
                    "//div[@class=\"product-info-wrapper css-m2w3q2 emlf3670\"]/div[4]/text()",
                    "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[1]/div[2]/div[2]/div[4]/text()"
                ]
            },
            {
                "_fn": "amount_from_string"
            }
        ]
    },
    "related_products": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [
                    "//div/div[@class=\"product-card css-e8at8d eag3qlw10\"]/a[1]/h4/text()",
                    "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[2]/div/a[1]/h4/text()",
                    "//div[@class=\"related-products css-1rinft1 emlf3670\"]/div/a[1]/h4/text()",
                    "//html[@lang=\"en\"]/body/div/main/div/div/div/div[2]/div[2]/div/a[1]/h4/text()",
                    "//div/div[@class=\"product-card css-e8at8d eag3qlw10\"]//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()",
                    "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[2]/div//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()",
                    "//div[@class=\"related-products css-1rinft1 emlf3670\"]/div//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()",
                    "//div/div[@class=\"product-card css-e8at8d eag3qlw10\"]//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()",
                    "//html[@lang=\"en\"]/body/div/main/div/div/div/div[2]/div[2]/div//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()",
                    "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[2]/div//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()",
                    "//div[@class=\"related-products css-1rinft1 emlf3670\"]/div//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()",
                    "//html[@lang=\"en\"]/body/div/main/div/div/div/div[2]/div[2]/div//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()"
                ]
            },
            {
                "_fn": "regex_search",
                "_args": [
                    "^\\s*(.[\\s\\S]*?)\\s*$",
                    1
                ]
            }
        ]
    }
}

Parsed data

{
    "price": 91.99,
    "product_title": "The Legend of Zelda: Ocarina of Time",
    "related_products": [
        "The Legend of Zelda: Majora's Mask",
        "Indiana Jones and the Infernal Machine"
    ],
    "parse_status_code": 12000
}

This feature is currently in Beta. We welcome your feedback and suggestions for improvement. Please don't hesitate to reach out to us at support@oxylabs.io or connect with our 24/7 live chat support.

Last updated