> For the complete documentation index, see [llms.txt](https://developers.oxylabs.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://developers.oxylabs.io/products/cn/ai-studio/browser-agent.md).

# Browser Agent

## 概述

[**Browser Agent**](https://aistudio.oxylabs.io/apps/browser_agent) 是来自 [**Oxylabs AI Studio**](https://aistudio.oxylabs.io/)的 AI 浏览器自动化工具。它通过执行多步骤操作来模拟真实用户浏览，例如点击链接、填写表单、滚动、捕获截图，然后提取结构化数据——全部由自然语言提示控制。

与传统自动化框架（如 Puppeteer 或 Selenium）不同，Browser Agent 不需要静态抓取规则或手动脚本。你可以用简明英语描述任务，或提供一系列步骤，AI 会像人类一样执行它们。

您可以预览该工具 [**此处**](https://aistudio.oxylabs.io/apps/browser_agent) 并通过我们的 Python/JavaScript SDK、MCP 服务器或我们的第三方集成之一将其集成到您的工作流中。

## 主要功能

* **通过浏览器 AI 完全控制** – 执行点击、输入、导航和滚动。
* **多步骤任务执行** – 用自然语言定义浏览流程。
* **多种输出** – 以 JSON、Markdown、HTML 或 PNG 截图形式获取结果。
* **动态内容支持** – 与 JavaScript 渲染页面交互。
* **基于 Schema 的提取** – 在浏览序列完成后请求结构化 JSON。

## 工作原理

要使用 browser AI agent 运行任务，请按照以下步骤：

1. **输入目标 URL。**
2. **将浏览过程描述为：**
   * **自然语言提示** （例如：“打开价格页面，接受 Cookie，并提取所有产品名称及价格。）
   * **结构化步骤列表** – 提供 AI 浏览器操作数组（`click`, `type`, `navigate`, `wait`, `extract`).
3. **选择输出格式：** JSON、Markdown、HTML 或 PNG 截图。
4. **（可选）如果选择 JSON**，请定义或自动生成 schema 来组织收集到的数据。

### 安装

开始之前，请确保你有 API 密钥访问权限（或获取一个 [免费试用](https://aistudio.oxylabs.io/register) 含 1000 点数）并且已安装 `Python 版本 3.10` 或更高版本。您可以安装 `oxylabs-ai-studio` 软件包，使用 pip：

```sh
pip install oxylabs-ai-studio
```

### 代码示例（Python）

以下示例展示如何使用浏览器 AI agent 执行浏览和数据提取。

```python
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent

browser_agent = BrowserAgent(api_key="<API_KEY>")

schema = browser_agent.generate_schema(
    prompt="游戏名称、平台、评分星级和价格"
)
print("schema: ", schema)

prompt = "查找商店中是否有游戏 'super mario odyssey'。如果有，查找价格。使用搜索栏查找该游戏。"
url = "https://sandbox.oxylabs.io/"
result = browser_agent.run(
    url=url,
    user_prompt=prompt,
    output_format="json",
    schema=schema,
)
print(result.data)
```

下面的示例在使用 Browser Agent 时捕获 PNG 截图。

```python
import base64
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent

browser_agent = BrowserAgent(api_key="<API_KEY>")

result = browser_agent.run(
    url = "https://sandbox.oxylabs.io/",
    user_prompt= "进入网站并截取主页截图",
    output_format="screenshot",
)

with open("screenshot.png", "wb") as f:
    f.write(base64.b64decode(result.data.content["data"]))
```

在我们的文档中了解更多关于 Browser Agent 和 Oxylabs AI Studio Python SDK 的信息 [PyPI 存储库](https://pypi.org/project/oxylabs-ai-studio/)。\
你也可以查看我们的 [AI Studio JavaScript SDK](https://github.com/oxylabs/oxylabs-ai-studio-js?tab=readme-ov-file#oxylabs-ai-studio-javascript-sdk) 指南，适用于 JS 用户。

### 请求参数

| 参数                                                         | 说明                                             | 默认值        |
| ---------------------------------------------------------- | ---------------------------------------------- | ---------- |
| <mark style="background-color:green;">`url`</mark>         | 要浏览的起始 URL                                     | –          |
| <mark style="background-color:green;">`user_prompt`</mark> | 用于提取的自然语言提示                                    | –          |
| `output_format`                                            | 输出格式（`json`, `markdown`, `html`, `screenshot`) | `markdown` |
| `schema`                                                   | 用于结构化提取的 OpenAPI schema（JSON 必需）               | –          |
| `geo_location`                                             | ISO2 格式的代理位置                                   | –          |

&#x20;    – 必填参数

### 输出示例

Browser Agent 可以返回易于集成到应用中的解析结果或截图。以下是我们的 JSON 输出示例：

```json
{
  "type": "json",
  "content": {
    "games": [
      {
        "game_name": "Super Mario Odyssey",
        "platform": "Nintendo Switch",
        "review_stars": null,
        "price": 89.99
      }
    ]
  }
}
```

以下是我们第二个请求的截图输出：

<figure><img src="https://github.com/oxylabs/browser-agent-py/raw/main/screenshot.png" alt=""><figcaption></figcaption></figure>

Browser Agent 支持多种输出格式（`"output": "YOUR_FORMAT"`):

* `json` – 使用基于 schema 的解析生成结构化数据。
* `markdown` – 易于阅读的数据，非常适合 AI 和自动化工作流。
* `html` – 网页的原始 HTML 数据。
* `screenshot` – 浏览器内容的 PNG 图像。

## 实际用例

你可以在多种场景中使用 AI Browser Agent，包括：

1. **电商结账模拟** – 添加商品到购物车、应用优惠券、确认结账流程。
2. **旅行搜索自动化** – 输入目的地、应用筛选条件，并提取航班或酒店价格。
3. **职位搜索抓取** – 搜索职位，点击浏览岗位信息，提取职位详情。
4. **活动与门票发现** – 浏览活动网站，获取标题、日期和价格。


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://developers.oxylabs.io/products/cn/ai-studio/browser-agent.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.