Documentation has been updated: see help center and changelog in one place.

Result Aggregator

Learn how to return multiple Web Scarper API reponses as a single larger output using Result Aggregator.

The Result Aggregator lets you collect multiple small results from separate scraping or parsing jobs into a single aggregated file. This is most useful when you run numerous jobs that return lots of small files that could be combined into larger output collections or need to process results in batch files (JSON, JSONL, or Gzip).

The aggregated responses can be delivered to your cloud storage (Google Cloud Storage, Amazon S3, or other S3-compatible services).

How to use it

1

Create an aggregator

First, define an aggregator instance with delivery storage target and delivery triggers.

Request example

The following request creates an aggregator that uploads a batch file every 1 hour (cron schedule) or when the file hits 500MB (524288000 bytes), whichever comes first.

curl -X POST https://data.oxylabs.io/v1/aggregators \
-u "USERNAME:PASSWORD" \
-H "Content-Type: application/json" \
-d '{
  "name": "amazon_hourly",
  "storage_type": "s3",
  "storage_url": "s3://my_bucket/batches",
  "max_result_count": 10000,
  "max_size_bytes": 524288000,
  "schedule": "0 */1 * * *"
}'

Request parameters

Parameter
Description
Type

name

Unique aggregator identifier.

string

storage_type

Storage provider ( s3, gcs, or s3_compatible).

string

storage_url

Destination bucket/container path.

string

file_output_type

Output format (json, jsonl, gzip_json, or gzip_jsonl)

string

max_size_bytes

Maximum batch size limit in bytes.

Max: 1GB.

integer

schedule

Aggregation frequency in cron expression. (e.g., 0 */1 * * * for every hour). Max: 24h.

string

max_result_count

Triggers delivery when the result count reaches the limit.

integer

callback_url

URL to your callback endpoint. More info

string

– mandatory parameter.

2

Send requests to aggregator

Once your aggregator is created, you can route scraping jobs to it using the aggregate_name parameter. You do not need to specify storage details in these requests, the aggregator handles the delivery.

Request example

curl --user "USERNAME:PASSWORD" \
'https://data.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{
    "source": "universal",
    "url": "https://www.example.com",
    "aggregate_name": "amazon_hourly"
}'
3

Retrieve aggregator info

You can check the configuration and usage statistics of your aggregator at any time.

Request example

GET https://data.oxylabs.io/v1/aggregators/{name}

Response example

{
    "name": "amazon_hourly",
    "callback_url": "",
    "storage_type": "s3",
    "storage_url": "s3://my_bucket/path_for_aggregates",
    "max_result_count": 1048576,
    "max_size_bytes": 524288000,
    "schedule": "0 */1 * * *",
    "file_output_type": "jsonl",
    "filename_prefix": "",
    "filename_suffix": "",
    "created_at": "2025-12-05T13:30:32Z",
    "usage_statistics": {
        "total_result_count": 0,
        "total_bytes_delivered": 0,
        "total_files_delivered": 0
    }
}

Delivery & Output

Automatic delivery

A batch file is closed and uploaded when any of the following occur:

  • The schedule time limit is reached (Default/Max: 24 hours).

  • The max_size_bytes size limit is reached (Default/Max: 1GB).

  • The max_result_count result limit is reached.

Manual delivery

You can force an immediate delivery of the current batch before limits are reached using the POST https://data.oxylabs.io/v1/aggregators/{name}/trigger endpoint like in the example below:

curl -X POST https://data.oxylabs.io/v1/aggregators/amazon_hourly/trigger -u "USERNAME:PASSWORD"

Output structure

Output batch files are saved to your storage with a unique timestamps:

my_bucket/
├── batches/
│   ├── 2024-08-08T01:00:00.000-00:00-amazon_hourly.jsonl
│   ├── 2024-08-08T02:00:00.000-00:00-amazon_hourly.jsonl
│   └── 2024-08-08T03:00:00.000-00:00-amazon_hourly.jsonl

Last updated

Was this helpful?