> For the complete documentation index, see [llms.txt](https://developers.oxylabs.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://developers.oxylabs.io/products/cn/web-scraper-api/features/js-rendering-and-browser-control.md).

# JS 渲染与浏览器控制

## JavaScript 渲染

如果你想抓取的页面需要 JavaScript 才能将所有必要数据动态加载到 DOM 中，你可以在请求中包含一个 `render` 参数，而不必手动设置和使用自定义浏览器指令。包含此参数的请求将被完全渲染，数据将根据指定参数存储为 HTML 文件或 PNG 截图。

### HTML

设置 `render` 参数来 `html` 即可获取渲染后页面的原始输出。

### PNG（截图）

设置 `render` 参数来 `png` 即可获取渲染后页面的 Base64 编码截图。

{% hint style="info" %}
如果你想抓取图片并下载，请参考 [**此部分**](/products/cn/web-scraper-api/features/result-processing-and-storage/output-types/download-images.md)**.**
{% endhint %}

### 请求示例

{% tabs %}
{% tab title="cURL" %}

```shell
curl --user "user:pass" \
'https://realtime.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{"source": "universal", "url": "https://www.example.com", "render": "html"}'
```

{% endtab %}

{% tab title="Python" %}

```python
import requests
from pprint import pprint

# Structure payload.
payload = {
    'source': 'universal',
    'url': 'https://www.example.com',
    'render': 'html',
}

# Get response.
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('user', 'pass1'),
    json=payload,
)

# 不返回作业状态和结果 URL 的响应，而是返回
# 结果的 JSON 响应。
pprint(response.json())
```

{% endtab %}

{% tab title="PHP" %}

```php
<?php

$params = [
    'source' => 'universal',
    'url' => 'https://www.example.com',
    'render' => 'html',
];

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "https://realtime.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");

$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$result = curl_exec($ch);
echo $result;

if (curl_errno($ch)) {
    echo 'Error:' . curl_error($ch);
}
curl_close($ch);
```

{% endtab %}

{% tab title="C#" %}

```csharp
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Net.Http.Json;
using System.Threading.Tasks;

namespace OxyApi
{
    class Program
    {
        static async Task Main()
        {
            const string Username = "YOUR_USERNAME";
            const string Password = "YOUR_PASSWORD";

            var parameters = new Dictionary<string, string>()
            {
                { "source", "universal" },
                { "url", "https://www.example.com" },
                { "render" : "html" },
            };


            var client = new HttpClient();

            Uri baseUri = new Uri("https://realtime.oxylabs.io");
            client.BaseAddress = baseUri;

            var requestMessage = new HttpRequestMessage(HttpMethod.Post, "/v1/queries");
            requestMessage.Content = JsonContent.Create(parameters);

            var authenticationString = $"{Username}:{Password}";
            var base64EncodedAuthenticationString = Convert.ToBase64String(System.Text.ASCIIEncoding.UTF8.GetBytes(authenticationString));
            requestMessage.Headers.Add("Authorization", "Basic " + base64EncodedAuthenticationString);

            var response = await client.SendAsync(requestMessage);
            var contents = await response.Content.ReadAsStringAsync();

            Console.WriteLine(contents);
        }
    }
}
```

{% endtab %}

{% tab title="Golang" %}

```go
package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io/ioutil"
	"net/http"
)

func main() {
	const Username = "YOUR_USERNAME"
	const Password = "YOUR_PASSWORD"

	payload := map[string]string{
		"source": "universal",
		"url": "https://www.example.com",
        	"render": "html",
	}

	jsonValue, _ := json.Marshal(payload)

	client := &http.Client{}
	request, _ := http.NewRequest("POST",
		"https://realtime.oxylabs.io/v1/queries",
		bytes.NewBuffer(jsonValue),
	)

	request.SetBasicAuth(Username, Password)
	response, _ := client.Do(request)

	responseText, _ := ioutil.ReadAll(response.Body)
	fmt.Println(string(responseText))
}
```

{% endtab %}

{% tab title="Java" %}

```java
package org.example;

import okhttp3.*;
import org.json.JSONObject;

public class Main implements Runnable {
    private static final String AUTHORIZATION_HEADER = "Authorization";
    public static final String USERNAME = "YOUR_USERNAME";
    public static final String PASSWORD = "YOUR_PASSWORD";

    public void run() {
        JSONObject jsonObject = new JSONObject();
        jsonObject.put("source", "universal");
        jsonObject.put("url", "https://www.example.com");
        jsonObject.put("render": "html");

        Authenticator authenticator = (route, response) -> {
            String credential = Credentials.basic(USERNAME, PASSWORD);

            return response
                    .request()
                    .newBuilder()
                    .header(AUTHORIZATION_HEADER, credential)
                    .build();
        };

        var client = new OkHttpClient.Builder()
                .authenticator(authenticator)
                .build();

        var mediaType = MediaType.parse("application/json; charset=utf-8");
        var body = RequestBody.create(jsonObject.toString(), mediaType);
        var request = new Request.Builder()
                .url("https://realtime.oxylabs.io/v1/queries")
                .post(body)
                .build();

        try (var response = client.newCall(request).execute()) {
            assert response.body() != null;
            System.out.println(response.body().string());
        } catch (Exception exception) {
            System.out.println("Error: " + exception.getMessage());
        }

        System.exit(0);
    }

    public static void main(String[] args) {
        new Thread(new Main()).start();
    }
}
```

{% endtab %}

{% tab title="Node.js" %}

```javascript
import fetch from 'node-fetch';

const username = 'YOUR_USERNAME';
const password = 'YOUR_PASSWORD';
const body = {
  'source': 'universal',
  'url': 'https://www.example.com',
  'render': 'html'
};
const response = await fetch('https://realtime.oxylabs.io/v1/queries', {
  method: 'post',
  body: JSON.stringify(body),
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Basic ' + Buffer.from(`${username}:${password}`).toString('base64'),
  }
});

console.log(await response.json());
```

{% endtab %}

{% tab title="HTTP" %}

```http
# 你提交的整个字符串必须进行 URL 编码。

https://realtime.oxylabs.io/v1/queries?source=universal&url=https%3A%2F%2Fwww.example.com%2F&render=html&access_token=12345abcde
```

{% endtab %}

{% tab title="JSON" %}

```json
{
    "source": "universal", 
    "url": "https://www.example.com", 
    "render": "html"
}
```

{% endtab %}
{% endtabs %}

{% hint style="warning" %}
JavaScript 渲染会增加抓取页面所需的时间。如果使用 Realtime 或 Proxy Endpoint 集成方式，请将客户端超时设置为 180 秒。
{% endhint %}

{% hint style="warning" %}
为了确保最低流量消耗，我们的系统在页面渲染期间不会加载不必要的资源。
{% endhint %}

## 强制渲染特定页面

为了成功抓取，某些特定域名的页面类型由于其动态内容需要渲染。即使用户未明确设置，我们的系统也会自动对这些页面强制启用渲染。

{% hint style="warning" %}
请注意，渲染任务相比非渲染任务会消耗更多流量。
{% endhint %}

我们希望用户在抓取以下页面时充分了解这一点：

{% file src="/files/8f476b13b3dc64fd9f5984624acdc541e11ee2ce" %}

这种方式提供尽可能好的抓取体验，确保这些高难度页面的数据准确性和可靠性。&#x20;

如果你想禁用渲染，可以通过在请求中添加以下参数来实现：

```
"render": ""
```

## 浏览器指令

你可以定义自己的浏览器指令，这些指令会在 JavaScript 渲染时执行。

{% hint style="success" %}
设置浏览器指令最简单的方法是使用 AI 驱动的可视化浏览器指令构建器，位于 [网页爬虫API Playground](https://dashboard.oxylabs.io/?route=/api-playground)。了解更多 [此处](/products/cn/web-scraper-api/web-scraper-api-playground/oxycopilot.md#browser-instruction-builder).
{% endhint %}

### 用法

要使用浏览器指令，请提供一组 `browser_instructions` 在创建任务时。

假设您想搜索词 `pizza boxes` 在某个网站中。

<figure><img src="/files/03fa85fee5fb2d4cf6db26ad65d017b1d606ff1f" alt=""><figcaption></figcaption></figure>

示例任务参数如下：

```json
{
    "source": "universal",
    "url": "https://www.ebay.com/",
    "render": "html",
    "browser_instructions": [
        {
            "type": "input",
            "value": "pizza boxes",
            "selector": {
                "type": "xpath",
                "value": "//input[@class='gh-tb ui-autocomplete-input']"
            }
        },
        {
            "type": "click",
            "selector": {
                "type": "xpath",
                "value": "//input[@type='submit']"
            }
        },
        {
            "type": "wait",
            "wait_time_s": 5
        }
    ]
}
```

**步骤 1。** 您必须提供 `"render": "html"` 参数。

**步骤 2。** 浏览器指令应在 `"browser_instructions"` 字段中描述。

上面的示例浏览器指令说明，目标是将搜索词 `pizza boxes` 输入搜索字段，点击 `搜索` 按钮，并等待 5 秒让内容加载。

抓取结果应如下所示：

```json
{
  "results": [
    {
      "content": "<!doctype html><html>
        执行指令后的内容      
      </html>",
      "created_at": "2023-10-11 11:35:23",
      "updated_at": "2023-10-11 11:36:08",
      "page": 1,
      "url": "https://www.ebay.com/",
      "job_id": "7117835067442906113",
      "status_code": 200
    }
  ]
}
```

抓取到的 HTML 应如下所示：

<figure><img src="/files/fcbcfe381d35374d52ef46c09cb1afe43330910b" alt=""><figcaption></figcaption></figure>

#### 获取浏览器资源 <a href="#fetching-browser-resources" id="fetching-browser-resources"></a>

我们提供了一个独立的浏览器指令用于获取浏览器资源。

函数定义如下：

使用 `fetch_resource` 将导致任务返回与所提供格式匹配的第一个 Fetch/XHR 资源，而不是正在目标页面上的 HTML。

假设我们想要定位在浏览器中以自然方式访问产品页时获取的 GraphQL 资源。我们将提供如下任务信息：

```json
{
    "source": "universal",
    "url": "https://www.example.com/product-page/123",
    "render": "html",
    "browser_instructions": [
        {
            "type": "fetch_resource",
            "filter": "/graphql/product-info/123"
        }
    ]
}
```

这些指令将产生如下结果：

```json
{
  "results": [
    {
      "content": "{'product_id': 123, 'description': '', 'price': 123}",
      "created_at": "2023-10-11 11:35:23",
      "updated_at": "2023-10-11 11:36:08",
      "page": 1,
      "url": "https://example.com/v1/graphql/product-info/123/",
      "job_id": "7117835067442906114",
      "status_code": 200
    }
  ]
}
```

## 支持的浏览器指令列表 <a href="#list-of-supported-browser-instructions" id="list-of-supported-browser-instructions"></a>

### 通用参数

下面定义的所有指令都具有一组一致的参数。参数如下。

#### `type` <a href="#type" id="type"></a>

* **类型**: `Enum["click", "input", "scroll", "scroll_to_bottom", "wait", "wait_for_element", "fetch_resource"]`
* **说明：** 浏览器指令类型。

#### `timeout_s` <a href="#timeout_s" id="timeout_s"></a>

* **类型**: `int`
* **说明：** 如果未能按时完成，动作在多久后被跳过。
* **限制**: 0 < `timeout_s` <= 60
* **默认值**: 5

#### `wait_time_s` <a href="#wait_time_s" id="wait_time_s"></a>

* **类型**: `int`
* **说明：** 执行下一步动作前等待多久。
* **限制**: 0 < `wait_time_s` <= 60
* **默认值**: 0

#### `on_error` <a href="#on_error" id="on_error"></a>

* **类型**: `Enum["error", "skip"]`
* **说明：** 指示当此指令失败时如何处理指令：
  * `"error"`: 停止执行浏览器指令。
  * `"skip"`: 继续执行下一条指令。
* **默认值**: `"error"`

#### 通用参数示例

```json
{
    "type": "wait_for_element",
    "selector": {
        "type": "text",
        "value": "Load More Items"
    },
    "timeout_s": 5,
    "wait_time_s": 2,
    "on_error": "skip"

}
```

### 指令 <a href="#click" id="click"></a>

#### `click` <a href="#click" id="click"></a>

* **说明**: 点击一个元素并等待指定秒数。
* **参数：**
  * `type: str = "click"`
  * `selector: dict`
    * `type: Enum["xpath", "css", "text"]`
    * `value: str`

**示例**:

```json
{
    "type": "click",
    "selector": {
        "type": "xpath",
        "value": "//button"
    }
}
```

#### `输入` <a href="#input" id="input"></a>

* **说明**: 向选中的元素输入文本。
* **参数：**
  * `type: str = "input"`
  * `selector: dict`
    * `type: Enum["xpath", "css", "text"]`
    * `value: str`
  * `value: str`&#x20;

**示例：**

```json
{
    "type": "input",
    "selector": {
        "type": "xpath",
        "value": "//input"
    },
    "value": "pizza boxes"
}
```

#### `滚动` <a href="#scroll" id="scroll"></a>

* **说明**: 按指定像素数滚动。
* **参数：**
  * `type: str = "scroll"`
  * `x: int`
  * `y: int`

**示例：**

```json
{
    "type": "scroll",
    "x": 0,
    "y": 100
}
```

#### `scroll_to_bottom` <a href="#scroll_to_bottom" id="scroll_to_bottom"></a>

* **说明**: 滚动到底部，持续指定秒数。
* **参数：**
  * `type: str = "scroll_to_bottom"`

**示例**:

```json
{
    "type": "scroll_to_bottom",
    "timeout_s": 10
}
```

#### `wait` <a href="#wait" id="wait"></a>

* **说明**: 等待指定秒数。
* **参数：**
  * `type: str = "wait"`

**示例**:

```json
{
    "type": "wait",
    "wait_time_s": 2
}
```

#### `wait_for_element` <a href="#wait_for_element" id="wait_for_element"></a>

* **说明**: 等待元素加载，持续指定秒数。
* **参数：**
  * `type: str = "wait_for_element"`
  * `selector: dict`
    * `type: Enum["xpath", "css", "text"]`
    * `value: str`

**示例：**

```json
{
    "type": "wait_for_element",
    "selector": {
        "type": "text",
        "value": "Load More Items"
    },
    "timeout_s": 5
}
```

#### `fetch_resource` <a href="#fetch_resource" id="fetch_resource"></a>

{% hint style="warning" %}
该 `fetch_resource` 该指令必须是浏览器指令列表中的最后一条；后续任何指令都不会执行。
{% endhint %}

* **说明**: 获取与指定模式匹配的第一个 Fetch/XHR 资源。
* **参数：**
  * `type: str = "fetch_resource"`
  * `filter: str(RegEx expression)`
  * `on_error: Enum["error", "skip"]`

**示例：**

```json
{
    "type": "fetch_resource",
    "filter": "/graphql/item/"
}
```

### 指令验证

任何关于指令格式的不一致都会导致一个 `400` 状态码和相应的错误消息。

例如，如下载荷：

```json
{
    "source": "universal",
    "url": "https://www.example.com/", 
    "render": "html",
    "browser_instructions": [
        {
            "type": "unsupported-wait",
            "wait_time_s": 5
        }
    ]
}
```

将导致：

```json
{    
    "errors": {
        "message": "Unsupported action type `unsupported-wait`, choose from 'click,fetch_resource,input,scroll,scroll_to_bottom,wait,wait_for_element'"
    }
}
```

## 故障排查 <a href="#status-codes" id="status-codes"></a>

### 状态码 <a href="#status-codes" id="status-codes"></a>

请参阅我们概述的响应码 [**此处**](/products/cn/web-scraper-api/response-codes.md)。关于指令验证的状态码已记录 [**此处**](/products/cn/web-scraper-api/features/js-rendering-and-browser-control.md#instruction-validation).

### 错误和警告

如果你的浏览操作产生了错误或警告，你会在结果中的以下键下找到它： `browser_instructions_error` 或 `browser_instructions_warnings`。例如，如果你发送了以下浏览器指令，而页面上找不到预期的 `xpath` ，结果将包含一个警告。

`browser_instructions`:

```json
[
    {
        "type": "input", 
        "selector": {
            "type": "xpath",
            "value": "//input[@type='search']"
        },
        "value": "oxylabs"
    }
]
```

结果：

```json
{
  "results": [
    {
      "content": "<!doctype html><html>
        执行指令后的内容      
      </html>",
      "created_at": "2023-10-11 11:35:23",
      "updated_at": "2023-10-11 11:36:08",
      "browser_instructions_warnings": [
        {
          "action_type": "click",
          "msg": "无法在页面上找到值为 `//input[@type=search]` 的 `xpath` 类型选择器。"
        },
      ],
      "page": 1,
      "url": "https://example.com",
      "job_id": "7117835067442906113",
      "status_code": 200
    }
  ]
}

```

| 可能的错误和警告                                                |
| ------------------------------------------------------- |
| 将浏览器指令转换为操作时发生了意外错误。                                    |
| 执行以下内容时发生了意外错误 `{action.type}` 浏览器指令。                   |
| 操作 `{action.type}` 超时。                                  |
| 无法找到选择器类型 `{selector.type}` 值为 `{selector.value}` 在页面上。 |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/products/cn/web-scraper-api/features/js-rendering-and-browser-control.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
