# Browser Agent

## Overview

[**Browser Agent**](https://aistudio.oxylabs.io/apps/browser_agent) is an AI browser automation tool from [**Oxylabs AI Studio**](https://aistudio.oxylabs.io/). It simulates real user browsing by executing multi-step actions like clicking links, filling forms, scrolling, capturing screenshots, and then extracting structured data – all controlled through natural language prompts.

Unlike traditional automation frameworks (e.g., Puppeteer or Selenium), Browser Agent requires no static scraping rules or manual scripting. You can describe tasks in plain English or provide a sequence of steps, and the AI will carry them out just like a human would.

You can preview the tool [**here**](https://aistudio.oxylabs.io/apps/browser_agent) and integrate it into your workflows by our Python/JavaScript SDKs, MCP server, or one of our 3rd-party integrations.

## Key features

* **Full control through browser AI** – execute clicks, inputs, navigation, and scrolling.
* **Multi-step task execution** – define browsing flows in natural language.
* **Multiple outputs** – get results in JSON, Markdown, HTML, or PNG screenshots.
* **Dynamic content support** – interact with JavaScript-rendered pages.
* **Schema-based extraction** – request structured JSON after the browsing sequence completes.

## How it works

To run tasks with browser AI agent, follow these steps:

1. **Enter the target URL.**
2. **Describe the browsing process as:**
   * **Natural language prompt** (e.g. “Open the pricing page, accept cookies, and extract all product names with prices.)
   * **Structured step list** – provide an array of AI browser actions (`click`, `type`, `navigate`, `wait`, `extract`).
3. **Select output format:** JSON, Markdown, HTML, or PNG screenshot.
4. **(Optional) If JSON is selected**, define or auto-generate a schema to structure the gathered data.

### Installation

To begin, be sure you have access to an API key (or get a [free trial](https://aistudio.oxylabs.io/register) with 1000 credits) and `Python ver. 3.10` or above installed. You can install the `oxylabs-ai-studio` package using pip:

```sh
pip install oxylabs-ai-studio
```

### Code examples (Python)

The following examples show how to use the browser AI agent to perform browsing and data extraction.

```python
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent

browser_agent = BrowserAgent(api_key="<API_KEY>")

schema = browser_agent.generate_schema(
    prompt="game name, platform, review stars and price"
)
print("schema: ", schema)

prompt = "Find if there is game 'super mario odyssey' in the store. If there is, find the price. Use search bar to find the game."
url = "https://sandbox.oxylabs.io/"
result = browser_agent.run(
    url=url,
    user_prompt=prompt,
    output_format="json",
    schema=schema,
)
print(result.data)
```

The example below captures a PNG screenshot while using Browser Agent.

```python
import base64
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent

browser_agent = BrowserAgent(api_key="<API_KEY>")

result = browser_agent.run(
    url = "https://sandbox.oxylabs.io/",
    user_prompt= "Go to the website and take a screenshot of the home page",
    output_format="screenshot",
)

with open("screenshot.png", "wb") as f:
    f.write(base64.b64decode(result.data.content["data"]))
```

Learn more about Browser Agent and Oxylabs AI Studio Python SDK in our [PyPI repository](https://pypi.org/project/oxylabs-ai-studio/).\
You can also check out our [AI Studio JavaScript SDK](https://github.com/oxylabs/oxylabs-ai-studio-js?tab=readme-ov-file#oxylabs-ai-studio-javascript-sdk) guide for JS users.

### Request parameters

| Parameter                                                  | Description                                                   | Default Value |
| ---------------------------------------------------------- | ------------------------------------------------------------- | ------------- |
| <mark style="background-color:green;">`url`</mark>         | Starting URL to browse                                        | –             |
| <mark style="background-color:green;">`user_prompt`</mark> | Natural language prompt for extraction                        | –             |
| `output_format`                                            | Output format (`json`, `markdown`, `html`, `screenshot`)      | `markdown`    |
| `schema`                                                   | OpenAPI schema for structured extraction (mandatory for JSON) | –             |
| `geo_location`                                             | Proxy location in ISO2 format                                 | –             |

&#x20;   – mandatory parameters

### Output samples

Browser Agent can return parsed results or screenshots that are easy to integrate into your applications. Here's what our JSON output looks like:

```json
{
  "type": "json",
  "content": {
    "games": [
      {
        "game_name": "Super Mario Odyssey",
        "platform": "Nintendo Switch",
        "review_stars": null,
        "price": 89.99
      }
    ]
  }
}
```

Here is a screenshot output of our second request:

<figure><img src="https://github.com/oxylabs/browser-agent-py/raw/main/screenshot.png" alt=""><figcaption></figcaption></figure>

Browser Agent supports multiple output formats (`"output": "YOUR_FORMAT"`):

* `json` – structured data using schema-based parsing.
* `markdown` – easy-to-read data, perfect for AI and automation workflows.
* `html` – raw HTML data of the webpage.
* `screenshot` – PNG image of the browser content.

## Practical use cases

You can use AI Browser Agent in various ways, including:

1. **E-commerce checkout simulation** – add items to cart, apply coupon, confirm checkout flow.
2. **Travel search automation** – enter destinations, apply filters, and extract flight or hotel prices.
3. **Job search scraping** – search for a role, click through postings, extract job details.
4. **Event & ticket discovery** – navigate event sites, retrieve titles, dates, and prices.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/products/ai-studio/browser-agent.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
