Web Scraper API features

Learn about all the features available in the Oxylabs Web Scraper API.

Our all-in-one Web Scraper API includes a collection of built-in features designed to simplify, scale-up, and automate public data extraction workflows. Refer to the following list of features and visit their documentation pages for in-depth configuration steps.

OxyCopilot

Develop web scrapers and parsers with an AI-powered feature, OxyCopilot, via the Web Scraper API Playground by simply providing target URLs and writing your needs in plain English. To learn more about how OxyCopilot works and explore ready-to-use prompts, visit the OxyCopilot prompts and code samples library available on our website.

Cloud integration

The cloud integration feature enables you to automatically retrieve job results directly to your Amazon S3, Google Cloud Storage, Alibaba OSS, or other S3-compatible storage. This way, you don’t have to make additional requests to get the data from us.

Batch queries

For efficient scraping operations, Web Scraper API allows you to submit up to 5,000 query or URL parameters per batch. Head to our documentation to learn more.

Custom Parser

When you want to parse the HTML of a web page, but there's no dedicated parser for the target, you can do so with Custom Parser by crafting your own parsing and data processing logic. You may also create, modify, and reuse Parser Presets by hosting them on our system. Self-healing makes the best use of our fast-adapting infrastructure and keeps your scraping operations working effectively as the target websites change.

Scheduler

For automatic execution of recurring scraping and parsing jobs, you can leverage the Scheduler feature to create schedules. We recommend using this feature together with cloud integration to retrieve data at specified intervals.

Custom browser instructions

Set up custom browser instructions to render JavaScript on web pages and execute browser actions like clicking, scrolling, and text input. Define them manually, via our step-by-step Playground interface, or let AI generate them from a plain English prompt.

XHR request capturing

Sometimes it is more convenient to extract the required data from one or more of the Fetch/XHR requests that a browser makes while loading the web page, rather than parsing the HTML. Fetch/XHR request capturing is a feature that lets you retrieve these requests as structured JSON data from dynamic content sources.

Markdown output

This feature allows you to request markdown output as an alternative option to HTML or parsed JSON results. These responses provide an easy-to-read format, simplifying result integration into various content workflows and AI tools. Markdown format is especially useful when working with LLMs due to its light weight and clear syntax. ​


Head back to the dashboard

Last updated

Was this helpful?