# Result Aggregator

The **Result Aggregator** lets you collect multiple small results from separate scraping or parsing jobs into a single aggregated file. This is most useful when you run numerous jobs that return lots of small files that could be combined into larger output collections or need to process results in batch files (JSON, JSONL, or Gzip).

The aggregated responses can be delivered to your [cloud storage](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/result-processing-and-storage/cloud-storage) (Google Cloud Storage, Amazon S3, or other S3-compatible services).

## How to use it

{% stepper %}
{% step %}

### Create an aggregator

First, define an aggregator instance with delivery storage target and delivery triggers.

#### Request example

The following request creates an aggregator that uploads a batch file every 1 hour (`cron schedule`) or when the file hits 500MB (`524288000` bytes), whichever comes first.

```bash
curl -X POST https://data.oxylabs.io/v1/aggregators \
-u "USERNAME:PASSWORD" \
-H "Content-Type: application/json" \
-d '{
  "name": "amazon_hourly",
  "storage_type": "s3",
  "storage_url": "s3://my_bucket/batches",
  "max_result_count": 10000,
  "max_size_bytes": 524288000,
  "schedule": "0 */1 * * *"
}'
```

#### Request parameters

| Parameter                                                   | Description                                                                                                                                         | Type      |
| ----------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | --------- |
| <mark style="background-color:green;">`name`</mark>         | Unique aggregator identifier.                                                                                                                       | `string`  |
| <mark style="background-color:green;">`storage_type`</mark> | Storage provider ( `s3`, `gcs`, or `s3_compatible`).                                                                                                | `string`  |
| <mark style="background-color:green;">`storage_url`</mark>  | Destination bucket/container path.                                                                                                                  | `string`  |
| `file_output_type`                                          | Output format (`json`, `jsonl`, `gzip_json`, or `gzip_jsonl`)                                                                                       | `string`  |
| `max_size_bytes`                                            | <p>Maximum batch size limit in bytes.</p><p>Max: <strong>1GB</strong>.</p>                                                                          | `integer` |
| `schedule`                                                  | <p>Aggregation frequency in <strong>cron expression</strong>. (e.g., <code>0 \*/1 \* \* \*</code> for every hour).<br>Max: <strong>1h</strong>.</p> | `string`  |
| `max_result_count`                                          | Triggers delivery when the result count reaches the limit.                                                                                          | `integer` |
| `callback_url`                                              | URL to your callback endpoint. [**More info**](https://developers.oxylabs.io/scraping-solutions/integration-methods/push-pull#callback)             | `string`  |

&#x20;    – mandatory parameter.
{% endstep %}

{% step %}

### Send requests to aggregator

Once your aggregator is created, you can route scraping jobs to it using the `aggregate_name` parameter. You do not need to specify storage details in these requests, the aggregator handles the delivery.

#### Request example

```bash
curl --user "USERNAME:PASSWORD" \
'https://data.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{
    "source": "universal",
    "url": "https://www.example.com",
    "aggregate_name": "amazon_hourly"
}'
```

{% endstep %}

{% step %}

### Retrieve aggregator info

You can check the configuration and usage statistics of your aggregator at any time.

#### Request example

```bash
GET https://data.oxylabs.io/v1/aggregators/{name}
```

#### Response example

```json
{
    "name": "amazon_hourly",
    "callback_url": "",
    "storage_type": "s3",
    "storage_url": "s3://my_bucket/path_for_aggregates",
    "max_result_count": 1048576,
    "max_size_bytes": 524288000,
    "schedule": "0 */1 * * *",
    "file_output_type": "jsonl",
    "filename_prefix": "",
    "filename_suffix": "",
    "created_at": "2025-12-05T13:30:32Z",
    "usage_statistics": {
        "total_result_count": 0,
        "total_bytes_delivered": 0,
        "total_files_delivered": 0
    }
}
```

{% endstep %}
{% endstepper %}

## Delivery & Output

### Automatic delivery

A batch file is closed and uploaded when any of the following occur:

* The `schedule` time limit is reached (Max: 1 hour).
* The `max_size_bytes` size limit is reached (Max: 1GB).
* The `max_result_count` result limit is reached.

### Manual delivery

You can force an immediate delivery of the current batch before limits are reached using the `POST https://data.oxylabs.io/v1/aggregators/{name}/trigger` endpoint like in the example below:

```bash
curl -X POST https://data.oxylabs.io/v1/aggregators/amazon_hourly/trigger -u "USERNAME:PASSWORD"
```

### Output structure

Output batch files are saved to your storage with a unique timestamps:

```
my_bucket/
├── batches/
│   ├── 2024-08-08T01:00:00.000-00:00-amazon_hourly.jsonl
│   ├── 2024-08-08T02:00:00.000-00:00-amazon_hourly.jsonl
│   └── 2024-08-08T03:00:00.000-00:00-amazon_hourly.jsonl
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/result-processing-and-storage/result-aggregator.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
