Result Aggregator
Learn how to return multiple Web Scarper API reponses as a single larger output using Result Aggregator.
The Result Aggregator lets you collect multiple small results from separate scraping or parsing jobs into a single aggregated file. This is most useful when you run numerous jobs that return lots of small files that could be combined into larger output collections or need to process results in batch files (JSON, JSONL, or Gzip).
The aggregated responses can be delivered to your cloud storage (Google Cloud Storage, Amazon S3, or other S3-compatible services).
How to use it
Create an aggregator
First, define an aggregator instance with delivery storage target and delivery triggers.
Request example
The following request creates an aggregator that uploads a batch file every 1 hour (cron schedule) or when the file hits 500MB (524288000 bytes), whichever comes first.
curl -X POST https://data.oxylabs.io/v1/aggregators \
-u "USERNAME:PASSWORD" \
-H "Content-Type: application/json" \
-d '{
"name": "amazon_hourly",
"storage_type": "s3",
"storage_url": "s3://my_bucket/batches",
"max_result_count": 10000,
"max_size_bytes": 524288000,
"schedule": "0 */1 * * *"
}'Request parameters
name
Unique aggregator identifier.
string
storage_type
Storage provider ( s3, gcs, or s3_compatible).
string
storage_url
Destination bucket/container path.
string
file_output_type
Output format (json, jsonl, gzip_json, or gzip_jsonl)
string
max_size_bytes
Maximum batch size limit in bytes.
Max: 1GB.
integer
schedule
Aggregation frequency in cron expression. (e.g., 0 */1 * * * for every hour).
Max: 24h.
string
max_result_count
Triggers delivery when the result count reaches the limit.
integer
– mandatory parameter.
Send requests to aggregator
Once your aggregator is created, you can route scraping jobs to it using the aggregate_name parameter. You do not need to specify storage details in these requests, the aggregator handles the delivery.
Request example
curl --user "USERNAME:PASSWORD" \
'https://data.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{
"source": "universal",
"url": "https://www.example.com",
"aggregate_name": "amazon_hourly"
}'Retrieve aggregator info
You can check the configuration and usage statistics of your aggregator at any time.
Request example
GET https://data.oxylabs.io/v1/aggregators/{name}Response example
{
"name": "amazon_hourly",
"callback_url": "",
"storage_type": "s3",
"storage_url": "s3://my_bucket/path_for_aggregates",
"max_result_count": 1048576,
"max_size_bytes": 524288000,
"schedule": "0 */1 * * *",
"file_output_type": "jsonl",
"filename_prefix": "",
"filename_suffix": "",
"created_at": "2025-12-05T13:30:32Z",
"usage_statistics": {
"total_result_count": 0,
"total_bytes_delivered": 0,
"total_files_delivered": 0
}
}Delivery & Output
Automatic delivery
A batch file is closed and uploaded when any of the following occur:
The
scheduletime limit is reached (Default/Max: 24 hours).The
max_size_bytessize limit is reached (Default/Max: 1GB).The
max_result_countresult limit is reached.
Manual delivery
You can force an immediate delivery of the current batch before limits are reached using the POST https://data.oxylabs.io/v1/aggregators/{name}/trigger endpoint like in the example below:
curl -X POST https://data.oxylabs.io/v1/aggregators/amazon_hourly/trigger -u "USERNAME:PASSWORD"Output structure
Output batch files are saved to your storage with a unique timestamps:
my_bucket/
├── batches/
│ ├── 2024-08-08T01:00:00.000-00:00-amazon_hourly.jsonl
│ ├── 2024-08-08T02:00:00.000-00:00-amazon_hourly.jsonl
│ └── 2024-08-08T03:00:00.000-00:00-amazon_hourly.jsonlLast updated
Was this helpful?

