Documentation has been updated: see help center and changelog in one place.

AI-Scraper

Learn how to get web data from a single URL for your AI workflows using AI Studio.

Overview

AI-Scraper is a scraping tool that extracts data from a single webpage. It identifies and parses relevant information based on a natural language prompt, then delivers results in either JSON (for automation and APIs) or Markdown format (best for readable outputs and AI workflows).

This AI scraper removes the need for CSS/XPath selectors or custom parsers, so it integrates seamlessly with various automation pipelines. Automatic schema generation and flexible output formats provide users with an easy way to extract clean, structured data without ever needing to maintain parsing logic.

You can preview the tool here and integrate it into your workflows by our Python/JavaScript SDKs, MCP server, or one of our 3rd-party integrations.

Key features

  • Natural language prompt-based extraction – Define your needs in plain English, and the scrape agent will retrieve the relevant information.

  • Multiple output formats – Choose JSON for structured workflows or Markdown for human-readable results and AI workflows.

  • Automatic schema generation – Generate a schema automatically from a prompt or define it manually for precise JSON parsing.

  • Works on any public webpage – Extract from e-commerce, news, blogs, or any other accessible source.

How it works

To scrape a webpage with AI-Scraper, follow these steps:

  1. Provide the webpage URL you want to scrape.

  2. Describe the data to extract in natural language (e.g. “Get all product names and prices”).

  3. Select the output format – structured JSON or Markdown.

  4. (Optional) Define a schema – Let AI-Scraper generate one automatically, or provide your own OpenAPI schema for the exact structure you desire.

Installation

To begin, make sure you have access to an AI Studio API key (or get a free trial with 1000 credits) and Python v3.10 or above installed. You can install the oxylabs-ai-studio package using pip:

pip install oxylabs-ai-studio

Code examples (Python)

The following examples show how to use AiScraper to extract data from a sample page.

from oxylabs_ai_studio.apps.ai_scraper import AiScraper
import json

# Initialize the AI Scraper with your API key
scraper = AiScraper(api_key="YOUR_API_KEY")

# Generate a schema automatically from natural language
schema = scraper.generate_schema(prompt="want to parse developer, platform, type, price game title, and genre (array)")
print(f"Generated schema: {schema}")

# Scrape a webpage and extract structured data
url = "https://sandbox.oxylabs.io/products/3"
result = scraper.scrape(
    url=url,
    output_format="json",
    schema=schema,
    render_javascript=False,
    geo_location="US",
)
# Print the scrape output as JSON
print("Results:")
print(json.dumps(result.data, indent=2))

Learn more about AI-Scraper and Oxylabs AI Studio Python SDK in our PyPI repository. You can also check out our AI Studio JavaScript SDK guide for JS users.

Request parameters

Parameter
Description
Default Value

url*

Target URL to scrape

output_format

Output format (json, markdown)

markdown

schema

OpenAPI schema for structured extraction (mandatory for JSON)

render_javascript

Enable render JavaScript

False

geo_location

Proxy location in ISO2 format

* – mandatory parameters

Output samples

AI-Scraper can return parsed, ready-to-use output that is easy to integrate into your applications.

Here's what its JSON output looks like:

{
  "games": [
    {
      "developer": "Nintendo EAD Tokyo",
      "platform": "wii",
      "type": "singleplayer",
      "price": 91.99,
      "title": "Super Mario Galaxy 2",
      "genre": [
        "Action",
        "Platformer"
      ]
    },
    {
      "developer": "Eidos Interactive",
      "platform": "wii",
      "type": null,
      "price": 80.99,
      "title": "Death Jr.: Root of Evil",
      "genre": [
        "Action",
        "Platformer",
        "3D"
      ]
    }
}

Alternatively, you can set output_format to markdown to receive Markdown-formatted results instead of JSON.

Practical use cases

AI-Scraper can be applied to a wide variety of data collection tasks:

  1. Extract product details – Gather product names, descriptions, and prices from e-commerce sites.

  2. Parse news articles – Retrieve article titles, dates, authors, and body text.

  3. Scrape pricing pages – Collect structured pricing information for competitor or market research.

  4. Extract job postings – Capture job titles, locations, salaries, and posting dates from recruitment portals.

Last updated

Was this helpful?