AI-Scraper
Learn how to get web data from a single URL for your AI workflows using AI Studio.
Overview
AI-Scraper is a scraping tool that extracts data from a single webpage. It identifies and parses relevant information based on a natural language prompt, then delivers results in either JSON (for automation and APIs) or Markdown format (best for readable outputs and AI workflows).
This AI scraper removes the need for CSS/XPath selectors or custom parsers, so it integrates seamlessly with various automation pipelines. Automatic schema generation and flexible output formats provide users with an easy way to extract clean, structured data without ever needing to maintain parsing logic.
You can preview the tool here and integrate it into your workflows by our Python/JavaScript SDKs, MCP server, or one of our 3rd-party integrations.
Key features
Natural language prompt-based extraction – Define your needs in plain English, and the scrape agent will retrieve the relevant information.
Multiple output formats – Choose JSON for structured workflows or Markdown for human-readable results and AI workflows.
Automatic schema generation – Generate a schema automatically from a prompt or define it manually for precise JSON parsing.
Works on any public webpage – Extract from e-commerce, news, blogs, or any other accessible source.
How it works
To scrape a webpage with AI-Scraper, follow these steps:
Provide the webpage URL you want to scrape.
Describe the data to extract in natural language (e.g. “Get all product names and prices”).
Select the output format – structured JSON or Markdown.
(Optional) Define a schema – Let AI-Scraper generate one automatically, or provide your own OpenAPI schema for the exact structure you desire.
Installation
To begin, make sure you have access to an AI Studio API key (or get a free trial with 1000 credits) and Python v3.10
or above installed. You can install the oxylabs-ai-studio
package using pip:
pip install oxylabs-ai-studio
Code examples (Python)
The following examples show how to use AiScraper
to extract data from a sample page.
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
import json
# Initialize the AI Scraper with your API key
scraper = AiScraper(api_key="YOUR_API_KEY")
# Generate a schema automatically from natural language
schema = scraper.generate_schema(prompt="want to parse developer, platform, type, price game title, and genre (array)")
print(f"Generated schema: {schema}")
# Scrape a webpage and extract structured data
url = "https://sandbox.oxylabs.io/products/3"
result = scraper.scrape(
url=url,
output_format="json",
schema=schema,
render_javascript=False,
geo_location="US",
)
# Print the scrape output as JSON
print("Results:")
print(json.dumps(result.data, indent=2))
Learn more about AI-Scraper and Oxylabs AI Studio Python SDK in our PyPI repository. You can also check out our AI Studio JavaScript SDK guide for JS users.
Request parameters
url
*
Target URL to scrape
–
output_format
Output format (json
, markdown
)
markdown
schema
OpenAPI schema for structured extraction (mandatory for JSON)
–
render_javascript
Enable render JavaScript
False
geo_location
Proxy location in ISO2 format
–
*
– mandatory parameters
Output samples
AI-Scraper can return parsed, ready-to-use output that is easy to integrate into your applications.
Here's what its JSON output looks like:
{
"games": [
{
"developer": "Nintendo EAD Tokyo",
"platform": "wii",
"type": "singleplayer",
"price": 91.99,
"title": "Super Mario Galaxy 2",
"genre": [
"Action",
"Platformer"
]
},
{
"developer": "Eidos Interactive",
"platform": "wii",
"type": null,
"price": 80.99,
"title": "Death Jr.: Root of Evil",
"genre": [
"Action",
"Platformer",
"3D"
]
}
}
Alternatively, you can set output_format
to markdown
to receive Markdown-formatted results instead of JSON.
Practical use cases
AI-Scraper can be applied to a wide variety of data collection tasks:
Extract product details – Gather product names, descriptions, and prices from e-commerce sites.
Parse news articles – Retrieve article titles, dates, authors, and body text.
Scrape pricing pages – Collect structured pricing information for competitor or market research.
Extract job postings – Capture job titles, locations, salaries, and posting dates from recruitment portals.
Last updated
Was this helpful?