Documentation has been updated: see help center and changelog in one place.

Browser Agent

Learn how to control your browser with an AI agent that mimics human actions through simple, natural language instructions.

Overview

Browser Agent is an AI browser automation tool from Oxylabs AI Studio. It simulates real user browsing by executing multi-step actions like clicking links, filling forms, scrolling, capturing screenshots, and then extracting structured data – all controlled through natural language prompts.

Unlike traditional automation frameworks (e.g., Puppeteer or Selenium), Browser Agent requires no static scraping rules or manual scripting. You can describe tasks in plain English or provide a sequence of steps, and the AI will carry them out just like a human would.

You can preview the tool here and integrate it into your workflows by our Python/JavaScript SDKs, MCP server, or one of our 3rd-party integrations.

Key features

  • Full control through browser AI – execute clicks, inputs, navigation, and scrolling.

  • Multi-step task execution – define browsing flows in natural language.

  • Multiple outputs – get results in JSON, Markdown, HTML, or PNG screenshots.

  • Dynamic content support – interact with JavaScript-rendered pages.

  • Schema-based extraction – request structured JSON after the browsing sequence completes.

How it works

To run tasks with browser AI agent, follow these steps:

  1. Enter the target URL.

  2. Describe the browsing process as:

    • Natural language prompt (e.g. “Open the pricing page, accept cookies, and extract all product names with prices.)

    • Structured step list – provide an array of AI browser actions (click, type, navigate, wait, extract).

  3. Select output format: JSON, Markdown, HTML, or PNG screenshot.

  4. (Optional) If JSON is selected, define or auto-generate a schema to structure the gathered data.

Installation

To begin, be sure you have access to an API key (or get a free trial with 1000 credits) and Python ver. 3.10 or above installed. You can install the oxylabs-ai-studio package using pip:

pip install oxylabs-ai-studio

Code examples (Python)

The following examples show how to use the browser AI agent to perform browsing and data extraction.

from oxylabs_ai_studio.apps.browser_agent import BrowserAgent

browser_agent = BrowserAgent(api_key="<API_KEY>")

schema = browser_agent.generate_schema(
    prompt="game name, platform, review stars and price"
)
print("schema: ", schema)

prompt = "Find if there is game 'super mario odyssey' in the store. If there is, find the price. Use search bar to find the game."
url = "https://sandbox.oxylabs.io/"
result = browser_agent.run(
    url=url,
    user_prompt=prompt,
    output_format="json",
    schema=schema,
)
print(result.data)

The example below captures a PNG screenshot while using Browser Agent.

import base64
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent

browser_agent = BrowserAgent(api_key="<API_KEY>")

result = browser_agent.run(
    url = "https://sandbox.oxylabs.io/",
    user_prompt= "Go to the website and take a screenshot of the home page",
    output_format="screenshot",
)

with open("screenshot.png", "wb") as f:
    f.write(base64.b64decode(result.data.content["data"]))

Learn more about Browser Agent and Oxylabs AI Studio Python SDK in our PyPI repository. You can also check out our AI Studio JavaScript SDK guide for JS users.

Request parameters

Parameter
Description
Default Value

url*

Starting URL to browse

user_prompt*

Natural language prompt for extraction

output_format

Output format (json, markdown, html, screenshot)

markdown

schema

OpenAPI schema for structured extraction (mandatory for JSON)

geo_location

Proxy location in ISO2 format

* – mandatory parameters

Output samples

Browser Agent can return parsed results or screenshots that are easy to integrate into your applications. Here's what our JSON output looks like:

{
  "type": "json",
  "content": {
    "games": [
      {
        "game_name": "Super Mario Odyssey",
        "platform": "Nintendo Switch",
        "review_stars": null,
        "price": 89.99
      }
    ]
  }
}

Here is a screenshot output of our second request:

Browser Agent supports multiple output formats ("output": "YOUR_FORMAT"):

  • json – structured data using schema-based parsing.

  • markdown – easy-to-read data, perfect for AI and automation workflows.

  • html – raw HTML data of the webpage.

  • screenshot – PNG image of the browser content.

Practical use cases

You can use AI Browser Agent in various ways, including:

  1. E-commerce checkout simulation – add items to cart, apply coupon, confirm checkout flow.

  2. Travel search automation – enter destinations, apply filters, and extract flight or hotel prices.

  3. Job search scraping – search for a role, click through postings, extract job details.

  4. Event & ticket discovery – navigate event sites, retrieve titles, dates, and prices.

Last updated

Was this helpful?