# AI-Scraper

## Overview

[**AI-Scraper**](https://aistudio.oxylabs.io/apps/scrape) is a scraping tool that extracts data from a single webpage. It identifies and parses relevant information based on a natural language prompt, then delivers results in either **JSON** (for automation and APIs) or **Markdown** format (best for readable outputs and AI workflows).

This AI scraper removes the need for CSS/XPath selectors or custom parsers, so it integrates seamlessly with various automation pipelines. **Automatic schema generation** and flexible output formats provide users with an easy way to extract clean, structured data without ever needing to maintain parsing logic.

You can preview the tool [**here**](https://aistudio.oxylabs.io/apps/scrape) and integrate it into your workflows by our Python/JavaScript SDKs, MCP server, or one of our 3rd-party integrations.

## Key features

* **Natural language prompt-based extraction** – Define your needs in plain English, and the scrape agent will retrieve the relevant information.
* **Multiple output formats** – Choose JSON for structured workflows or Markdown for human-readable results and AI workflows.
* **Automatic schema generation** – Generate a schema automatically from a prompt or define it manually for precise JSON parsing.
* **Works on any public webpage** – Extract from e-commerce, news, blogs, or any other accessible source.

## How it works

To scrape a webpage with AI-Scraper, follow these steps:

1. **Provide the webpage URL** you want to scrape.
2. **Describe the data to extract** in natural language (e.g. “Get all product names and prices”).
3. **Select the output format** – structured JSON or Markdown.
4. **(Optional) Define a schema** – Let AI-Scraper generate one automatically, or provide your own OpenAPI schema for the exact structure you desire.

### Installation

To begin, make sure you have access to an AI Studio API key (or [get a free trial](https://aistudio.oxylabs.io/register) with 1000 credits) and `Python v3.10` or above installed. You can install the `oxylabs-ai-studio` package using pip:

```sh
pip install oxylabs-ai-studio
```

### Code examples (Python)

The following examples show how to use `AiScraper` to extract data from a sample page.

```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
import json

# Initialize the AI Scraper with your API key
scraper = AiScraper(api_key="YOUR_API_KEY")

# Generate a schema automatically from natural language
schema = scraper.generate_schema(prompt="want to parse developer, platform, type, price game title, and genre (array)")
print(f"Generated schema: {schema}")

# Scrape a webpage and extract structured data
url = "https://sandbox.oxylabs.io/products/3"
result = scraper.scrape(
    url=url,
    output_format="json",
    schema=schema,
    render_javascript=False,
    geo_location="US",
)
# Print the scrape output as JSON
print("Results:")
print(json.dumps(result.data, indent=2))
```

Learn more about AI-Scraper and Oxylabs AI Studio Python SDK in our [PyPI repository](https://pypi.org/project/oxylabs-ai-studio/). You can also check out our [AI Studio JavaScript SDK](https://github.com/oxylabs/oxylabs-ai-studio-js) guide for JS users.

### Request parameters

| Parameter                                          | Description                                                   | Default Value |
| -------------------------------------------------- | ------------------------------------------------------------- | ------------- |
| <mark style="background-color:green;">`url`</mark> | Target URL to scrape                                          | –             |
| `output_format`                                    | Output format (`json`, `markdown`)                            | `markdown`    |
| `schema`                                           | OpenAPI schema for structured extraction (mandatory for JSON) | –             |
| `render_javascript`                                | Enable render JavaScript                                      | `False`       |
| `geo_location`                                     | Proxy location in ISO2 format                                 | –             |

&#x20;   – mandatory parameters

### Output samples

AI-Scraper can return parsed, ready-to-use output that is easy to integrate into your applications.

Here's what its JSON output looks like:

```json
{
  "games": [
    {
      "developer": "Nintendo EAD Tokyo",
      "platform": "wii",
      "type": "singleplayer",
      "price": 91.99,
      "title": "Super Mario Galaxy 2",
      "genre": [
        "Action",
        "Platformer"
      ]
    },
    {
      "developer": "Eidos Interactive",
      "platform": "wii",
      "type": null,
      "price": 80.99,
      "title": "Death Jr.: Root of Evil",
      "genre": [
        "Action",
        "Platformer",
        "3D"
      ]
    }
}
```

Alternatively, you can set `output_format`  to `markdown` to receive Markdown-formatted results instead of JSON.

## Practical use cases

AI-Scraper can be applied to a wide variety of data collection tasks:

1. **Extract product details** – Gather product names, descriptions, and prices from e-commerce sites.
2. **Parse news articles** – Retrieve article titles, dates, authors, and body text.
3. **Scrape pricing pages** – Collect structured pricing information for competitor or market research.
4. **Extract job postings** – Capture job titles, locations, salaries, and posting dates from recruitment portals.

<br>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/products/ai-studio/ai-scraper.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
