# OxyCopilot

**OxyCopilot** is a free [**Web Scraper API** ](https://developers.oxylabs.io/scraping-solutions/web-scraper-api)feature that makes onboarding easier and helps users find effective solutions for complex use cases, all without the need for coding knowledge. OxyCopilot currently includes three separate features:

* **Scraper builder**
* [**Custom Parser**](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/custom-parser) **builder**
* **Browser instruction builder**

{% hint style="success" %}
OxyCopilot is accessible on the [**Web** **Scraper API Playground**](https://dashboard.oxylabs.io/?route=/api-playground) in the Oxylabs dashboard.
{% endhint %}

{% embed url="<https://youtu.be/9JoF8_5r5HY?si=61c3Zkx6FrH06PVa>" %}

## Scraper builder

OxyCopilot helps you configure a scraper (and form the request payload) for the Web Scraper API without needing to understand the documentation or field logic.

### How it works

#### **Step 1: Provide a URL and prompt**

* **URL:** Provide the URL that you want to scrape.
* **Prompt:** Describe your requirements (e.g., localization, JS rendering, etc.).

<figure><img src="https://63892162-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2F8us2o21uKCH3bWvBzsuD%2FScreenshot%202024-09-24%20at%2016.19.08.png?alt=media&#x26;token=5e596058-5b7a-4b50-8066-6a1a753f51b2" alt="" width="563"><figcaption></figcaption></figure>

**Step 2: Parsing**

You have three options for handling parsing:

1. **Custom Parser**: Select "Add parsing instructions" to create your own parsing logic using the [**Custom Parser builder**](#custom-parser-builder).
2. **Dedicated Parser**: If the URL is from a website for which we provide a dedicated parser and you want to use it, select "Continue with Dedicated Parser".
3. **No Parsing**: Choose to proceed without parsing if structured data isn’t needed.

{% hint style="warning" %}
If the URL belongs to a website for which we have a dedicated parser, but you don’t need structured data, select "Continue with Dedicated Parser" and disable the parse parameter in the playground's settings. Avoid using the exit button, as it won’t save the pre-filled parameters.
{% endhint %}

<div><figure><img src="https://63892162-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2Fm5u1SlidrTSQEilFuIau%2FScreenshot%202024-09-24%20at%2016.20.09.png?alt=media&#x26;token=18797e40-0e87-4cbf-8c85-7858e6a04ae6" alt="" width="375"><figcaption><p>If we don't have a dedicated parser</p></figcaption></figure> <figure><img src="https://63892162-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2FqpGMYN6v7I3qDSLv8U7e%2FScreenshot%202024-09-24%20at%2016.19.36.png?alt=media&#x26;token=c20a3576-aa02-4e02-947f-d59a89059f6e" alt="" width="375"><figcaption><p>If we have a dedicated parser</p></figcaption></figure></div>

#### **Step 3: Review the request**

Based on your prompt, OxyCopilot will prefill the necessary parameters in the Web Scraper API Playground. You will see the specific request code and parameters for your use case, and you can adjust the parameters if needed.

**Step 4: Submit the request and copy**

If everything looks good, submit the request to see how the output looks and check if it works as expected. Then, copy the request code to use for your further scraping tasks with the Web Scraper API.

### Example

#### URL

```
https://www.amazon.de/s?k=adidas
```

#### Prompt

{% code overflow="wrap" %}

```
Scrape the Amazon search page from the provided URL and localize the results to Poland.
```

{% endcode %}

#### AI-generated parameters (JSON)

```json
{
        "source": "amazon_search",
        "query": "adidas",
        "geo_location": "PL",
        "domain": "de"
}
```

#### AI-generated request codes

<figure><img src="https://63892162-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2FyIuCJsCNXCjX37tb7LkI%2FScreenshot%202024-09-24%20at%2016.44.00.png?alt=media&#x26;token=707d951c-7443-4937-beae-a6624ce67780" alt=""><figcaption></figcaption></figure>

## Custom Parser builder

Leverage the [**Custom Parser**](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/custom-parser) feature with OxyCopilot to build a parser without needing to write code or analyze website structures manually.

### How It Works

#### **Step 1: Provide URL(s) and prompt**

* **URL(s):** You can provide up to **3 URLs** for which you want to generate parsing instructions. OxyCopilot uses the HTML of the provided URLs to determine the best logic for extracting the required fields.&#x20;

{% hint style="info" %}
The more URLs you provide, the more robust the parsing instructions will be, as OxyCopilot identifies common patterns across similar pages. Note that additional URLs may increase the wait time for results.
{% endhint %}

* **Prompt:** The prompt is the key component in building a natural language schema, which serves as the basis for generating the actual parsing instructions. The prompt should clearly describe the fields that need to be parsed.

<figure><img src="https://63892162-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2FenZwnLUhIiYFHIKwD4ix%2FScreenshot%202024-09-24%20at%2017.57.15.png?alt=media&#x26;token=eca0603b-1451-40d0-b0c1-32c19f5b6644" alt=""><figcaption></figcaption></figure>

#### **Step 2 \[Optional]: Adjust parsing schema**

This step allows you to fine-tune the parsing schema to better meet your needs or troubleshoot any issues.

#### **Parsing schema overview**

<figure><img src="https://63892162-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2FaUeqARjA3AyoIAqGXppw%2FScreenshot%202024-09-24%20at%2017.59.27.png?alt=media&#x26;token=9274f830-c944-423f-9995-eb1f882f1e47" alt=""><figcaption></figcaption></figure>

This table visualizes the input used by the AI to generate parsing instructions. The schema defines which fields need to be parsed and consists of various object types (explained in the [**table**](#object-type-explanations) below).

Each item in the schema must have:

* **Name**: This will be used as the object key in the parsing instructions and visible in the parsed data.
* **Description** (optional but recommended): Helps improve parsing accuracy.

### **Schema adjustments**

* **Reorder items**: Drag and drop items using the dots on the left side to change their order (only items within the same nesting level can be moved).
* **Edit items**: Click the edit icon to modify any field.
* **Delete items**: You can delete any item at the parent level.
* **Add new items**: Add new items to the parent level.

Once you update the schema, click the **"Refresh output"** button to regenerate the instructions and preview the parsed data.

### Object type explanations

<table><thead><tr><th width="208">Object type</th><th width="243">Description</th><th>Parsed data example</th></tr></thead><tbody><tr><td>String</td><td>A single text output</td><td><code>“title”: “Example product title”</code></td></tr><tr><td>Number</td><td>A single number</td><td><code>“price”: 9.99</code></td></tr><tr><td>Array of strings</td><td>A list of text outputs</td><td><code>“products”: [“product 1”, “product 2”, “product 3”]</code></td></tr><tr><td>Array of numbers</td><td>A list of numbers</td><td><code>“pages”: [1, 2, 3]</code></td></tr><tr><td>Array of objects</td><td>A list of objects/items, each having their own objects inside (<code>_items</code> block in the parsing instructions)</td><td><p></p><pre class="language-json"><code class="lang-json">“related_items”: [
  {
    “title”: “product 1”,
    “price”: 9.99
  },
  {
    “title”: “product 2”,
    “price”: 15.99
  }
]
</code></pre></td></tr></tbody></table>

### Working with array of objects

1. **Select "Array of objects"**: This option adds a child object and button.

<figure><img src="https://lh7-qw.googleusercontent.com/docsz/AD_4nXcnZ-xxFBAjZPzSJesa5bjbUj7wOQlGn7Ut4bxQzrRNbUmN0CkcfOZa23QRLma2vUsINNl6c5TOixopuBGdIk9iKFvWNpfpkF5s-zL9CKWxEEeJ40yZc6n2eqRsUw45HcWJjZikl4pERT-8-nF5Pno7kpQ?key=TW5rMlJ-s_BzFm7nRv1Dlw" alt=""><figcaption></figcaption></figure>

2. **Fill out object names**: To save the item to the schema, you must fill out the names of both the parent and child objects. Once done, the checkmark will turn green.

<figure><img src="https://lh7-qw.googleusercontent.com/docsz/AD_4nXffrjzhyFW4oiVj6MHaRGp7ysfkC1cVR4viQEWM5FBE3vhElH-ZRL5B796G6cfK5dNMvLtXafioTUoQaG-3QQTuaPLcq4UcsmA524hNW_IMjvw6pUdY-CRAHaYvyMkctNX0pp9qmWrxoOR3sNAwqwc8OpVT?key=TW5rMlJ-s_BzFm7nRv1Dlw" alt=""><figcaption></figcaption></figure>

3. **Child object requirement**: An "Array of objects" must have at least one child.

### Testing the instructions

By default, parsed data is based on the first URL provided in **Step 1**. You can also provide a different URL to test the parsing instructions:

<figure><img src="https://63892162-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2Fj2thZaXqrk6EctGnn6BB%2FScreenshot%202024-09-24%20at%2018.14.45.png?alt=media&#x26;token=cc59a24b-4cb8-4bb8-a44b-083b5e6c58ca" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
Instructions are generated based on the initial URLs and do not account for the test URLs. Editing the prompt or URLs will reset the schema, requiring a full regeneration.
{% endhint %}

#### **Step 3: Copy/Save instructions and integrate into scraping jobs**

Once the instructions are satisfactory:

* Use the **"Copy"** button to copy the instructions and paste them into your scraper code.
* Alternatively, save the instructions to your Web Scraper API Playground session, adjust other request parameters, test, and then copy the complete request code in your preferred programming language.

<figure><img src="https://63892162-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2FEXxaaJchNqJxygtZGtNr%2FScreenshot%202024-09-24%20at%2018.17.04.png?alt=media&#x26;token=42f6a232-635e-481f-b231-af57c70b93ca" alt=""><figcaption></figcaption></figure>

### Example

#### URL

```
https://sandbox.oxylabs.io/products/1
```

#### Prompt

{% code overflow="wrap" %}

```
I want to parse a product page. The parsed data should include the following fields:

- product_title: a text field containing the product title
- price: a numeric field containing the product price
- related_products: a list containing the titles of related products displayed below the main product information
```

{% endcode %}

#### Parsing schema

<table><thead><tr><th width="214">Object type</th><th width="209">Name*</th><th>Description</th></tr></thead><tbody><tr><td>String</td><td>product_title</td><td>Product title</td></tr><tr><td>Number</td><td>price</td><td>Product price</td></tr><tr><td>Array of strings</td><td>related_products</td><td>Related product titles below the main product information</td></tr></tbody></table>

#### Parsing instructions

```json
{
    "product_title": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [
                    "//h2[@class=\"title css-1k75zwy e1pl6npa11\"]/text()",
                    "//div[@class=\"product-info-wrapper css-m2w3q2 emlf3670\"]/h2/text()",
                    "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[1]/div[2]/div[2]/h2/text()"
                ]
            },
            {
                "_fn": "regex_search",
                "_args": [
                    "^\\s*(.[\\s\\S]*?)\\s*$",
                    1
                ]
            }
        ]
    },
    "price": {
        "_fns": [
            {
                "_fn": "xpath_one",
                "_args": [
                    "//div[@class=\"price css-o7uf8d e1pl6npa6\"]/text()",
                    "//div[@class=\"product-info-wrapper css-m2w3q2 emlf3670\"]/div[4]/text()",
                    "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[1]/div[2]/div[2]/div[4]/text()"
                ]
            },
            {
                "_fn": "amount_from_string"
            }
        ]
    },
    "related_products": {
        "_fns": [
            {
                "_fn": "xpath",
                "_args": [
                    "//div/div[@class=\"product-card css-e8at8d eag3qlw10\"]/a[1]/h4/text()",
                    "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[2]/div/a[1]/h4/text()",
                    "//div[@class=\"related-products css-1rinft1 emlf3670\"]/div/a[1]/h4/text()",
                    "//html[@lang=\"en\"]/body/div/main/div/div/div/div[2]/div[2]/div/a[1]/h4/text()",
                    "//div/div[@class=\"product-card css-e8at8d eag3qlw10\"]//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()",
                    "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[2]/div//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()",
                    "//div[@class=\"related-products css-1rinft1 emlf3670\"]/div//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()",
                    "//div/div[@class=\"product-card css-e8at8d eag3qlw10\"]//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()",
                    "//html[@lang=\"en\"]/body/div/main/div/div/div/div[2]/div[2]/div//h4[@class=\"title css-7u5e79 eag3qlw7\"]/text()",
                    "//div[@id=\"__next\"]/main/div/div/div/div[2]/div[2]/div//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()",
                    "//div[@class=\"related-products css-1rinft1 emlf3670\"]/div//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()",
                    "//html[@lang=\"en\"]/body/div/main/div/div/div/div[2]/div[2]/div//a[@class=\"card-header css-o171kl eag3qlw2\"]/h4/text()"
                ]
            },
            {
                "_fn": "regex_search",
                "_args": [
                    "^\\s*(.[\\s\\S]*?)\\s*$",
                    1
                ]
            }
        ]
    }
}
```

#### Parsed data

```json
{
    "price": 91.99,
    "product_title": "The Legend of Zelda: Ocarina of Time",
    "related_products": [
        "The Legend of Zelda: Majora's Mask",
        "Indiana Jones and the Infernal Machine"
    ],
    "parse_status_code": 12000
}
```

### Generating parsing instructions via an API

If you want to generate a lot of different parsing instruction sets to cover the variety of websites you work with, you can build parsing instructions via an API – see [parsing instruction generator API](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/custom-parser/generating-parsing-instructions-via-api) to see how it's done.

## Browser instructions builder

You can use OxyCopilot to build intricate page interaction scripts without analyzing site structure or manually writing the configuration for your [Browser Instructions](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/js-rendering-and-browser-control/browser-instructions).

### How It Works

#### **Step 1: Provide a URL and a prompt**

* **URL:** Please provide a single URL to generate browser instructions for. OxyCopilot uses the HTML of the provided URL to determine how to script the webpage interactions you require.&#x20;
* **Prompt:** The prompt is crucial for building the browser instructions. Please clearly state what actions you would like to be performed on the web page once it's open (e.g. "Scroll to the bottom, wait for the 'next page' button to load, click the 'next page' button").

<figure><img src="https://63892162-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2F4oFl4O9MiTTU1WrMSmD0%2Fimage.png?alt=media&#x26;token=c3d4af49-f14a-4c90-9834-b73f37497132" alt=""><figcaption></figcaption></figure>

#### **Step 2 \[Optional]: Adjust browser instructions**

This step allows you to fine-tune the browser instruction sequence to better meet your needs or troubleshoot any issues.

#### **Browser instruction overview**

<figure><img src="https://63892162-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2Fy3PXi97y667OluyZXb09%2Fimage.png?alt=media&#x26;token=864598a4-9b9f-40e6-a6a0-e3bc0976fd82" alt=""><figcaption></figcaption></figure>

Once OxyCopilot is done processing your input, it will show the browser instruction sequence it has created.

You may tweak the sequence by editing, adding or removing steps.

#### **Step 3: Copy/Save instructions and integrate into scraping jobs**

Once the instructions are satisfactory, you may save the instructions to your Web Scraper API Playground session, adjust other request parameters, test, and then copy the complete request code in your preferred programming language.

<figure><img src="https://63892162-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzrXw45naRpCZ0Ku9AjY1%2Fuploads%2F3dwrsRJAlICWwdXp0Ou0%2Fimage.png?alt=media&#x26;token=ab5b507d-c630-4864-95d1-1866738b0e90" alt=""><figcaption></figcaption></figure>

{% hint style="success" %}
We welcome your feedback and suggestions for improvement. Please don't hesitate to reach out to us at <support@oxylabs.io> or connect with our 24/7 live chat support.
{% endhint %}
