> For the complete documentation index, see [llms.txt](https://developers.oxylabs.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://developers.oxylabs.io/products/cn/web-scraper-api/features/result-processing-and-storage/result-aggregator.md).

# 结果聚合器

该 **结果聚合器** 可让你将来自不同抓取或解析任务的多个小结果收集到一个聚合文件中。当你运行大量任务、返回许多小文件，并且这些文件可以合并为更大的输出集合，或需要以批处理文件（JSON、JSONL 或 Gzip）处理结果时，这尤其有用。

聚合后的响应可以发送到你的 [云存储](/products/cn/web-scraper-api/features/result-processing-and-storage/cloud-storage.md) （Google Cloud Storage、Amazon S3 或其他兼容 S3 的服务）。

## 如何使用

{% stepper %}
{% step %}

### 创建聚合器

首先，定义一个带有投递存储目标和投递触发条件的聚合器实例。

#### 请求示例

下面的请求创建了一个聚合器，它会每 1 小时上传一个批处理文件（`cron 计划`）或在文件达到 500MB（`524288000` 字节）时上传，以先到者为准。

```bash
curl -X POST https://data.oxylabs.io/v1/aggregators \
-u "USERNAME:PASSWORD" \
-H "Content-Type: application/json" \
-d '{
  "name": "amazon_hourly",
  "storage_type": "s3",
  "storage_url": "s3://my_bucket/batches",
  "max_result_count": 10000,
  "max_size_bytes": 524288000,
  "schedule": "0 */1 * * *"
}'
```

#### 请求参数

| 参数                                                                   | 说明                                                                                                               | 类型       |
| -------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | -------- |
| <mark style="color:默认;background-color:green;">`name`</mark>         | 唯一的聚合器标识符。                                                                                                       | `string` |
| <mark style="color:默认;background-color:green;">`storage_type`</mark> | 存储提供商（ `s3`, `gcs`，或 `s3_compatible`).                                                                           | `string` |
| <mark style="color:默认;background-color:green;">`storage_url`</mark>  | 目标 bucket/container 路径。                                                                                          | `string` |
| `file_output_type`                                                   | 输出格式（`json`, `jsonl`, `gzip_json`，或 `gzip_jsonl`)                                                                | `string` |
| `max_size_bytes`                                                     | <p>最大批处理大小限制，单位为字节。</p><p>最大： <strong>1GB</strong>.</p>                                                          | `整数`     |
| `schedule`                                                           | <p>聚合频率，使用 <strong>cron 表达式</strong>。（例如， <code>0 \*/1 \* \* \*</code> 表示每小时一次）。<br>最大： <strong>1h</strong>.</p> | `string` |
| `max_result_count`                                                   | 当结果数量达到限制时触发投递。                                                                                                  | `整数`     |
| `callback_url`                                                       | 回调端点的 URL。 [**更多信息**](/products/cn/web-scraper-api/integration-methods/push-pull.md#callback)                    | `string` |

&#x20;    – 必填参数。
{% endstep %}

{% step %}

### 向聚合器发送请求

创建聚合器后，你可以使用 `aggregate_name` 参数将抓取任务路由到它。你无需在这些请求中指定存储细节，聚合器会处理投递。

#### 请求示例

```bash
curl --user "USERNAME:PASSWORD" \
'https://data.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{
    "source": "universal",
    "url": "https://www.example.com",
    "aggregate_name": "amazon_hourly"
}'
```

{% endstep %}

{% step %}

### 获取聚合器信息

你可以随时查看聚合器的配置和使用统计信息。

#### 请求示例

```bash
GET https://data.oxylabs.io/v1/aggregators/{name}
```

#### 响应示例

```json
{
    "name": "amazon_hourly",
    "callback_url": "",
    "storage_type": "s3",
    "storage_url": "s3://my_bucket/path_for_aggregates",
    "max_result_count": 1048576,
    "max_size_bytes": 524288000,
    "schedule": "0 */1 * * *",
    "file_output_type": "jsonl",
    "filename_prefix": "",
    "filename_suffix": "",
    "created_at": "2025-12-05T13:30:32Z",
    "usage_statistics": {
        "total_result_count": 0,
        "total_bytes_delivered": 0,
        "total_files_delivered": 0
    }
}
```

{% endstep %}
{% endstepper %}

## 投递与输出

### 自动投递

当以下任一情况发生时，批处理文件会被关闭并上传：

* 该 `schedule` 达到时间限制（最大：1 小时）。
* 该 `max_size_bytes` 达到大小限制（最大：1GB）。
* 该 `max_result_count` 达到结果限制。

### 手动投递

你可以在达到限制之前，使用以下方式强制立即投递当前批次： `POST https://data.oxylabs.io/v1/aggregators/{name}/trigger` 端点，如下例所示：

```bash
curl -X POST https://data.oxylabs.io/v1/aggregators/amazon_hourly/trigger -u "USERNAME:PASSWORD"
```

### 输出结构

输出批处理文件会以唯一时间戳保存在你的存储中：

```
my_bucket/
├── batches/
│   ├── 2024-08-08T01:00:00.000-00:00-amazon_hourly.jsonl
│   ├── 2024-08-08T02:00:00.000-00:00-amazon_hourly.jsonl
│   └── 2024-08-08T03:00:00.000-00:00-amazon_hourly.jsonl
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/products/cn/web-scraper-api/features/result-processing-and-storage/result-aggregator.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.