# JS 渲染与浏览器控制

## JavaScript 渲染

如果您希望抓取的页面需要 JavaScript 才能将所有必要数据动态加载到 DOM 中，您可以在请求中包含一个 `render` 参数，而不是手动设置和使用自定义浏览器指令。带有此参数的请求将被完全渲染，数据将根据指定的参数存储为 HTML 文件或 PNG 截图。

### HTML

将 `render` 参数来 `html` 设为以获取渲染后页面的原始输出。

### PNG（截图）

将 `render` 参数来 `png` 以获取渲染后页面的 Base64 编码截图。

{% hint style="info" %}
如果您想抓取图片并下载，请参阅 [**本节**](/products/cn/web-scraper-api/features/result-processing-and-storage/output-types/download-images.md)**.**
{% endhint %}

### 请求示例

{% tabs %}
{% tab title="cURL" %}

```shell
curl --user "user:pass" \
'https://realtime.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{"source": "universal", "url": "https://www.example.com", "render": "html"}'
```

{% endtab %}

{% tab title="Python" %}

```python
import requests
from pprint import pprint

# 构造有效载荷。
payload = {
    'source': 'universal',
    'url': 'https://www.example.com',
    'render': 'html',
}

# 获取响应。
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('user', 'pass1'),
    json=payload,
)

# 这里不会返回作业状态和结果 URL，而是会返回
# 包含结果的 JSON 响应。
pprint(response.json())
```

{% endtab %}

{% tab title="PHP" %}

```php
<?php

$params = [
    'source' => 'universal',
    'url' => 'https://www.example.com',
    'render' => 'html',
];

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "https://realtime.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "user" . ":" . "pass1");

$headers = array();
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$result = curl_exec($ch);
echo $result;

if (curl_errno($ch)) {
    echo 'Error:' . curl_error($ch);
}
curl_close($ch);
```

{% endtab %}

{% tab title="C#" %}

```csharp
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Net.Http.Json;
using System.Threading.Tasks;

namespace OxyApi
{
    class Program
    {
        static async Task Main()
        {
            const string Username = "YOUR_USERNAME";
            const string Password = "YOUR_PASSWORD";

            var parameters = new Dictionary<string, string>()
            {
                { "source", "universal" },
                { "url", "https://www.example.com" },
                { "render" : "html" },
            };


            var client = new HttpClient();

            Uri baseUri = new Uri("https://realtime.oxylabs.io");
            client.BaseAddress = baseUri;

            var requestMessage = new HttpRequestMessage(HttpMethod.Post, "/v1/queries");
            requestMessage.Content = JsonContent.Create(parameters);

            var authenticationString = $"{Username}:{Password}";
            var base64EncodedAuthenticationString = Convert.ToBase64String(System.Text.ASCIIEncoding.UTF8.GetBytes(authenticationString));
            requestMessage.Headers.Add("Authorization", "Basic " + base64EncodedAuthenticationString);

            var response = await client.SendAsync(requestMessage);
            var contents = await response.Content.ReadAsStringAsync();

            Console.WriteLine(contents);
        }
    }
}
```

{% endtab %}

{% tab title="Golang" %}

```go
package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io/ioutil"
	"net/http"
)

func main() {
	const Username = "YOUR_USERNAME"
	const Password = "YOUR_PASSWORD"

	payload := map[string]string{
		"source": "universal",
		"url": "https://www.example.com",
        	"render": "html",
	}

	jsonValue, _ := json.Marshal(payload)

	client := &http.Client{}
	request, _ := http.NewRequest("POST",
		"https://realtime.oxylabs.io/v1/queries",
		bytes.NewBuffer(jsonValue),
	)

	request.SetBasicAuth(Username, Password)
	response, _ := client.Do(request)

	responseText, _ := ioutil.ReadAll(response.Body)
	fmt.Println(string(responseText))
}
```

{% endtab %}

{% tab title="Java" %}

```java
package org.example;

import okhttp3.*;
import org.json.JSONObject;

public class Main implements Runnable {
    private static final String AUTHORIZATION_HEADER = "Authorization";
    public static final String USERNAME = "YOUR_USERNAME";
    public static final String PASSWORD = "YOUR_PASSWORD";

    public void run() {
        JSONObject jsonObject = new JSONObject();
        jsonObject.put("source", "universal");
        jsonObject.put("url", "https://www.example.com");
        jsonObject.put("render": "html");

        Authenticator authenticator = (route, response) -> {
            String credential = Credentials.basic(USERNAME, PASSWORD);

            return response
                    .request()
                    .newBuilder()
                    .header(AUTHORIZATION_HEADER, credential)
                    .build();
        };

        var client = new OkHttpClient.Builder()
                .authenticator(authenticator)
                .build();

        var mediaType = MediaType.parse("application/json; charset=utf-8");
        var body = RequestBody.create(jsonObject.toString(), mediaType);
        var request = new Request.Builder()
                .url("https://realtime.oxylabs.io/v1/queries")
                .post(body)
                .build();

        try (var response = client.newCall(request).execute()) {
            assert response.body() != null;
            System.out.println(response.body().string());
        } catch (Exception exception) {
            System.out.println("Error: " + exception.getMessage());
        }

        System.exit(0);
    }

    public static void main(String[] args) {
        new Thread(new Main()).start();
    }
}
```

{% endtab %}

{% tab title="Node.js" %}

```javascript
import fetch from 'node-fetch';

const username = 'YOUR_USERNAME';
const password = 'YOUR_PASSWORD';
const body = {
  'source': 'universal',
  'url': 'https://www.example.com',
  'render': 'html'
};
const response = await fetch('https://realtime.oxylabs.io/v1/queries', {
  method: 'post',
  body: JSON.stringify(body),
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Basic ' + Buffer.from(`${username}:${password}`).toString('base64'),
  }
});

console.log(await response.json());
```

{% endtab %}

{% tab title="HTTP" %}

```http
# 您提交的整个字符串必须经过 URL 编码。

https://realtime.oxylabs.io/v1/queries?source=universal&url=https%3A%2F%2Fwww.example.com%2F&render=html&access_token=12345abcde
```

{% endtab %}

{% tab title="JSON" %}

```json
{
    "source": "universal", 
    "url": "https://www.example.com", 
    "render": "html"
}
```

{% endtab %}
{% endtabs %}

{% hint style="warning" %}
JavaScript 渲染抓取页面需要更多时间。如果使用 Realtime 或 Proxy Endpoint 集成方式，请将客户端超时时间设置为 180 秒。
{% endhint %}

{% hint style="warning" %}
为确保最低流量消耗，我们的系统在页面渲染期间不会加载不必要的资源。
{% endhint %}

## 强制对特定页面进行渲染

为了成功抓取，某些特定域名的页面类型由于其动态内容而需要渲染。即使用户未明确设置，我们的系统也会自动对这些页面强制执行渲染。

{% hint style="warning" %}
请注意，与未渲染任务相比，渲染任务会消耗更多流量。
{% endhint %}

在抓取以下页面时，我们希望用户充分了解这一点：

{% file src="/files/8f476b13b3dc64fd9f5984624acdc541e11ee2ce" %}

此方法提供尽可能好的抓取体验，确保这些具有挑战性的页面上的数据准确且可靠。&#x20;

如果您希望禁用渲染，可以通过在请求中添加以下参数来实现：

```
"render": ""
```

## 浏览器说明

您可以定义自己的浏览器指令，这些指令会在渲染 JavaScript 时执行。

{% hint style="success" %}
设置浏览器指令最简单的方法是使用位于 [网页爬虫API Playground](https://dashboard.oxylabs.io/?route=/api-playground)上的 AI 驱动可视化浏览器指令构建器。了解更多 [这里](/products/cn/web-scraper-api/web-scraper-api-playground/oxycopilot.md#browser-instruction-builder).
{% endhint %}

### 用法

要使用浏览器指令，请提供一组 `browser_instructions` ，在创建任务时。

假设您想在网站中搜索词语 `pizza boxes` 。

<figure><img src="/files/03fa85fee5fb2d4cf6db26ad65d017b1d606ff1f" alt=""><figcaption></figcaption></figure>

示例任务参数如下：

```json
{
    "source": "universal",
    "url": "https://www.ebay.com/",
    "render": "html",
    "browser_instructions": [
        {
            "type": "input",
            "value": "pizza boxes",
            "selector": {
                "type": "xpath",
                "value": "//input[@class='gh-tb ui-autocomplete-input']"
            }
        },
        {
            "type": "click",
            "selector": {
                "type": "xpath",
                "value": "//input[@type='submit']"
            }
        },
        {
            "type": "wait",
            "wait_time_s": 5
        }
    ]
}
```

**步骤 1。** 您必须提供 `"render": "html"` 参数。

**步骤 2。** 浏览器指令应在 `"browser_instructions"` 字段中描述。

上面的示例浏览器指令说明，目标是在搜索字段中输入搜索词 `pizza boxes` ，点击 `搜索` 按钮，并等待 5 秒以便内容加载。

抓取结果应如下所示：

```json
{
  "results": [
    {
      "content": "<!doctype html><html>
        执行指令后的内容      
      </html>",
      "created_at": "2023-10-11 11:35:23",
      "updated_at": "2023-10-11 11:36:08",
      "page": 1,
      "url": "https://www.ebay.com/",
      "job_id": "7117835067442906113",
      "status_code": 200
    }
  ]
}
```

抓取到的 HTML 应如下所示：

<figure><img src="/files/fcbcfe381d35374d52ef46c09cb1afe43330910b" alt=""><figcaption></figcaption></figure>

#### 获取浏览器资源 <a href="#fetching-browser-resources" id="fetching-browser-resources"></a>

我们提供了一个用于获取浏览器资源的独立浏览器指令。

该函数定义于此：

使用 `fetch_resource` 将使任务返回与提供格式匹配的第一个 Fetch/XHR 资源实例，而不是所针对的 HTML。

假设我们想要定位一个 GraphQL 资源，该资源是在浏览器中自然访问产品页面时被获取的。我们将这样提供任务信息：

```json
{
    "source": "universal",
    "url": "https://www.example.com/product-page/123",
    "render": "html",
    "browser_instructions": [
        {
            "type": "fetch_resource",
            "filter": "/graphql/product-info/123"
        }
    ]
}
```

这些指令将产生如下结果：

```json
{
  "results": [
    {
      "content": "{'product_id': 123, 'description': '', 'price': 123}",
      "created_at": "2023-10-11 11:35:23",
      "updated_at": "2023-10-11 11:36:08",
      "page": 1,
      "url": "https://example.com/v1/graphql/product-info/123/",
      "job_id": "7117835067442906114",
      "status_code": 200
    }
  ]
}
```

## 受支持的浏览器指令列表 <a href="#list-of-supported-browser-instructions" id="list-of-supported-browser-instructions"></a>

### 通用参数

下面定义的所有指令都具有一组一致的参数。参数如下。

#### `输入` <a href="#type" id="type"></a>

* **类型**: `Enum["click", "input", "scroll", "scroll_to_bottom", "wait", "wait_for_element", "fetch_resource"]`
* **说明：** 浏览器指令类型。

#### `timeout_s` <a href="#timeout_s" id="timeout_s"></a>

* **类型**: `int`
* **说明：** 如果在规定时间内未完成，动作在多久后被跳过。
* **限制**: 0 < `timeout_s` <= 60
* **默认值**: 5

#### `wait_time_s` <a href="#wait_time_s" id="wait_time_s"></a>

* **类型**: `int`
* **说明：** 执行下一步操作前等待多长时间。
* **限制**: 0 < `wait_time_s` <= 60
* **默认值**: 0

#### `on_error` <a href="#on_error" id="on_error"></a>

* **类型**: `Enum["error", "skip"]`
* **说明：** 指示在此指令失败时如何处理后续指令：
  * `"error"`：停止执行浏览器指令。
  * `"skip"`：继续执行下一条指令。
* **默认值**: `"error"`

#### 通用参数示例

```json
{
    "type": "wait_for_element",
    "selector": {
        "type": "text",
        "value": "Load More Items"
    },
    "timeout_s": 5,
    "wait_time_s": 2,
    "on_error": "skip"

}
```

### 指令 <a href="#click" id="click"></a>

#### `点击` <a href="#click" id="click"></a>

* **描述**：点击一个元素并等待指定秒数。
* **参数：**
  * `type: str = "click"`
  * `selector: dict`
    * `type: Enum["xpath", "css", "text"]`
    * `value: str`

**示例**:

```json
{
    "type": "click",
    "selector": {
        "type": "xpath",
        "value": "//button"
    }
}
```

#### `input` <a href="#input" id="input"></a>

* **描述**：向选定元素输入文本。
* **参数：**
  * `type: str = "input"`
  * `selector: dict`
    * `type: Enum["xpath", "css", "text"]`
    * `value: str`
  * `value: str`&#x20;

**示例：**

```json
{
    "type": "input",
    "selector": {
        "type": "xpath",
        "value": "//input"
    },
    "value": "pizza boxes"
}
```

#### `scroll` <a href="#scroll" id="scroll"></a>

* **描述**：滚动指定像素数。
* **参数：**
  * `type: str = "scroll"`
  * `x: int`
  * `y: int`

**示例：**

```json
{
    "type": "scroll",
    "x": 0,
    "y": 100
}
```

#### `scroll_to_bottom` <a href="#scroll_to_bottom" id="scroll_to_bottom"></a>

* **描述**：滚动到页面底部，持续指定秒数。
* **参数：**
  * `type: str = "scroll_to_bottom"`

**示例**:

```json
{
    "type": "scroll_to_bottom",
    "timeout_s": 10
}
```

#### `等待` <a href="#wait" id="wait"></a>

* **描述**：等待指定秒数。
* **参数：**
  * `type: str = "wait"`

**示例**:

```json
{
    "type": "wait",
    "wait_time_s": 2
}
```

#### `wait_for_element` <a href="#wait_for_element" id="wait_for_element"></a>

* **描述**：等待元素加载，持续指定秒数。
* **参数：**
  * `type: str = "wait_for_element"`
  * `selector: dict`
    * `type: Enum["xpath", "css", "text"]`
    * `value: str`

**示例：**

```json
{
    "type": "wait_for_element",
    "selector": {
        "type": "text",
        "value": "Load More Items"
    },
    "timeout_s": 5
}
```

#### `fetch_resource` <a href="#fetch_resource" id="fetch_resource"></a>

{% hint style="warning" %}
该 `fetch_resource` 该指令必须是浏览器指令列表中的最后一条；其后的任何指令都不会被执行。
{% endhint %}

* **描述**：获取与设定模式匹配的第一个 Fetch/XHR 资源。
* **参数：**
  * `type: str = "fetch_resource"`
  * `filter: str(RegEx expression)`
  * `on_error: Enum["error", "skip"]`

**示例：**

```json
{
    "type": "fetch_resource",
    "filter": "/graphql/item/"
}
```

### 指令验证

任何关于指令格式的不一致都将导致 `400` 状态码和相应的错误消息。

例如，如下负载：

```json
{
    "source": "universal",
    "url": "https://www.example.com/",
    "render": "html",
    "browser_instructions": [
        {
            "type": "unsupported-wait",
            "wait_time_s": 5
        }
    ]
}
```

将导致：

```json
{    
    "errors": {
        "message": "不支持的动作类型 `unsupported-wait`，请从 'click,fetch_resource,input,scroll,scroll_to_bottom,wait,wait_for_element' 中选择"
    }
}
```

## 故障排除 <a href="#status-codes" id="status-codes"></a>

### 状态码 <a href="#status-codes" id="status-codes"></a>

请参阅我们列出的响应代码 [**这里**](/products/cn/web-scraper-api/response-codes.md)。关于指令验证的状态码已在文档中说明 [**这里**](/products/cn/web-scraper-api/features/js-rendering-and-browser-control.md#instruction-validation).

### 错误和警告

如果您的浏览操作产生了错误或警告，您会在结果中的以下键下找到它： `browser_instructions_error` 或 `browser_instructions_warnings`。例如，如果您发送了以下浏览器指令，而期望的 `xpath` 未在页面上找到，结果将包含一个警告。

`browser_instructions`:

```json
[
    {
        "type": "input", 
        "selector": {
            "type": "xpath",
            "value": "//input[@type='search']"
        },
        "value": "oxylabs"
    }
]
```

结果：

```json
{
  "results": [
    {
      "content": "<!doctype html><html>
        执行指令后的内容      
      </html>",
      "created_at": "2023-10-11 11:35:23",
      "updated_at": "2023-10-11 11:36:08",
      "browser_instructions_warnings": [
        {
          "action_type": "click",
          "msg": "无法在页面上找到值为 `//input[@type=search]` 的 `xpath` 类型选择器。"
        },
      ],
      "page": 1,
      "url": "https://example.com",
      "job_id": "7117835067442906113",
      "status_code": 200
    }
  ]
}

```

| 可能的错误和警告                                                |
| ------------------------------------------------------- |
| 将浏览器指令转换为操作时发生了意外错误。                                    |
| 执行以下内容时发生了意外错误 `{action.type}` 浏览器指令。                   |
| 操作 `{action.type}` 超时。                                  |
| 无法找到选择器类型 `{selector.type}` 值为 `{selector.value}` 在页面上。 |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/products/cn/web-scraper-api/features/js-rendering-and-browser-control.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.