# Result Aggregator

The **Result Aggregator** lets you collect multiple small results from separate scraping or parsing jobs into a single aggregated file. This is most useful when you run numerous jobs that return lots of small files that could be combined into larger output collections or need to process results in batch files (JSON, JSONL, or Gzip).

The aggregated responses can be delivered to your [cloud storage](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/result-processing-and-storage/cloud-storage) (Google Cloud Storage, Amazon S3, or other S3-compatible services).

## How to use it

{% stepper %}
{% step %}

### Create an aggregator

First, define an aggregator instance with delivery storage target and delivery triggers.

#### Request example

The following request creates an aggregator that uploads a batch file every 1 hour (`cron schedule`) or when the file hits 500MB (`524288000` bytes), whichever comes first.

```bash
curl -X POST https://data.oxylabs.io/v1/aggregators \
-u "USERNAME:PASSWORD" \
-H "Content-Type: application/json" \
-d '{
  "name": "amazon_hourly",
  "storage_type": "s3",
  "storage_url": "s3://my_bucket/batches",
  "max_result_count": 10000,
  "max_size_bytes": 524288000,
  "schedule": "0 */1 * * *"
}'
```

#### Request parameters

| Parameter                                                   | Description                                                                                                                                         | Type      |
| ----------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | --------- |
| <mark style="background-color:green;">`name`</mark>         | Unique aggregator identifier.                                                                                                                       | `string`  |
| <mark style="background-color:green;">`storage_type`</mark> | Storage provider ( `s3`, `gcs`, or `s3_compatible`).                                                                                                | `string`  |
| <mark style="background-color:green;">`storage_url`</mark>  | Destination bucket/container path.                                                                                                                  | `string`  |
| `file_output_type`                                          | Output format (`json`, `jsonl`, `gzip_json`, or `gzip_jsonl`)                                                                                       | `string`  |
| `max_size_bytes`                                            | <p>Maximum batch size limit in bytes.</p><p>Max: <strong>1GB</strong>.</p>                                                                          | `integer` |
| `schedule`                                                  | <p>Aggregation frequency in <strong>cron expression</strong>. (e.g., <code>0 \*/1 \* \* \*</code> for every hour).<br>Max: <strong>1h</strong>.</p> | `string`  |
| `max_result_count`                                          | Triggers delivery when the result count reaches the limit.                                                                                          | `integer` |
| `callback_url`                                              | URL to your callback endpoint. [**More info**](https://developers.oxylabs.io/scraping-solutions/integration-methods/push-pull#callback)             | `string`  |

&#x20;    – mandatory parameter.
{% endstep %}

{% step %}

### Send requests to aggregator

Once your aggregator is created, you can route scraping jobs to it using the `aggregate_name` parameter. You do not need to specify storage details in these requests, the aggregator handles the delivery.

#### Request example

```bash
curl --user "USERNAME:PASSWORD" \
'https://data.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{
    "source": "universal",
    "url": "https://www.example.com",
    "aggregate_name": "amazon_hourly"
}'
```

{% endstep %}

{% step %}

### Retrieve aggregator info

You can check the configuration and usage statistics of your aggregator at any time.

#### Request example

```bash
GET https://data.oxylabs.io/v1/aggregators/{name}
```

#### Response example

```json
{
    "name": "amazon_hourly",
    "callback_url": "",
    "storage_type": "s3",
    "storage_url": "s3://my_bucket/path_for_aggregates",
    "max_result_count": 1048576,
    "max_size_bytes": 524288000,
    "schedule": "0 */1 * * *",
    "file_output_type": "jsonl",
    "filename_prefix": "",
    "filename_suffix": "",
    "created_at": "2025-12-05T13:30:32Z",
    "usage_statistics": {
        "total_result_count": 0,
        "total_bytes_delivered": 0,
        "total_files_delivered": 0
    }
}
```

{% endstep %}
{% endstepper %}

## Delivery & Output

### Automatic delivery

A batch file is closed and uploaded when any of the following occur:

* The `schedule` time limit is reached (Max: 1 hour).
* The `max_size_bytes` size limit is reached (Max: 1GB).
* The `max_result_count` result limit is reached.

### Manual delivery

You can force an immediate delivery of the current batch before limits are reached using the `POST https://data.oxylabs.io/v1/aggregators/{name}/trigger` endpoint like in the example below:

```bash
curl -X POST https://data.oxylabs.io/v1/aggregators/amazon_hourly/trigger -u "USERNAME:PASSWORD"
```

### Output structure

Output batch files are saved to your storage with a unique timestamps:

```
my_bucket/
├── batches/
│   ├── 2024-08-08T01:00:00.000-00:00-amazon_hourly.jsonl
│   ├── 2024-08-08T02:00:00.000-00:00-amazon_hourly.jsonl
│   └── 2024-08-08T03:00:00.000-00:00-amazon_hourly.jsonl
```
