Scheduler
Scheduler is a free feature of the Scraper APIs that lets you automate recurring scraping and parsing jobs by creating schedules.
Check out the video tutorial below to learn more about Scheduler and how it works.
Step-by-step guide on automating your recurring scraping jobs using Scheduler
We advise using Scheduler together with the Upload to Cloud Storage feature. This way, you can set up your schedule and receive regular data updates in your storage without trying to fetch results from our system.
IMPORTANT: Scheduler is a powerful tool that can quickly raise your service bill. We advise testing it with a few job items and a limited number of repeats to ensure you get the correct data at the right intervals. Once that's established, you can stop the test schedule and create a new, scaled-up schedule.
When creating a new schedule, follow the simple steps below.
- 1.Tell us how often we should repeat the jobs by submitting a cron schedule expression;
- 2.Give us a bunch of job parameter sets that we should execute at scheduled times;
- 3.Let us know when to stop by submitting an end time.
HINT: Find out how to form job parameter sets by visiting the documentation of the particular Scraper API (SERP / E-Commerce / Real Estate / Web) you want to use.
NOTE: You can also download and import this Postman collection to try out all our Scheduler endpoints. New to Postman? Learn more about this tool here.
Scheduler has several endpoints you can use to control the service:
Use this endpoint to initiate a new schedule.
- Endpoint:
https://data.oxylabs.io/v1/schedules
- Method:
POST
- Authentication:
Basic
- Request headers:
Content-Type: application/json
Input
Parameter | Description | Default Value |
---|---|---|
cron | - | |
items | List of Scraper APIs job parameter sets that should be executed as part of the schedule. | - |
end_time | The time at which the schedule should stop running. NB: the end time is inclusive. | - |
- required parameter
The payload below will make Scheduler run two jobs schedule at 03:00 on Mondays until
end_time
(inclusive).{
"cron": "0 3 * * 1",
"items": [
{"source": "universal", "url": "https://ip.oxylabs.io"},
{"source": "google_search", "query": "stuff"}
],
"end_time": "2032-12-21 12:34:45"
}
The response below confirms that the schedule was created successfully.
{
"schedule_id": 4134906379157007223,
"active": true,
"items_count": 3,
"cron": "0 3 * * 1",
"end_time": "2032-12-21 12:34:45",
"next_run_at": "2022-06-06 10:15:00"
}
Use this endpoint to get the list of all schedules associated with your user account.
- Endpoint:
https://data.oxylabs.io/v1/schedules
- Method:
GET
- Authentication:
Basic
This endpoint returns the list of all schedule IDs associated with the user account making the request.
See the sample response below.
{
"schedules": [
1764178033254455101,
2885262175311057587,
3251365810325795747,
4134906379157007223,
4164931482277157062
]
}
Use this endpoint to get information about a specific schedule.
- Endpoint:
https://data.oxylabs.io/v1/schedules/{id}
- Method:
GET
- Authentication:
Basic
The payload below contains a sample schedule info response.
{
"schedule_id": 1764178033254455101,
"active": true,
"items_count": 3,
"cron": "0 3 * * 1",
"end_time": "2032-12-21 12:34:45",
"next_run_at": "2022-06-06 10:18:00",
"stats": {
"total_job_count": 3,
"job_create_outcomes": [
{
"status_code": 202,
"job_count": 3,
"ratio": 1
}
],
"job_result_outcomes": [
{
"status": "done",
"job_count": 2,
"ratio": 0.67
},
{
"status": "faulted",
"job_count": 1,
"ratio": 0.33
}
]
}
}
NOTE: Currently,
items
(i.e., the jobs to be executed as part of a schedule) are missing in the output. We are aware of this and will add them shortly.Key | Description | Type |
---|---|---|
schedule_id | The unique ID of the schedule. | Integer |
active | Is the schedule active right now? | Boolean |
items_count | The number of items (jobs) in the schedule. | Integer |
cron | The cron expression associated with the schedule. | String |
end_time | The time upon which the schedule will stop being repeated. | String |
next_run_at | The time upon which the schedule will run next. | String |
stats | Contains stats job creation and job completion statistics. | JSON Object |
stats :total_job_count | The number of items (jobs) in the schedule. | Integer |
stats :job_create_outcomes | Contains job creation statistics. | JSON Array |
stats :job_create_outcomes :status_code | The status code received in response to an attempt to execute the schedule (create a scraping/parsing job). | Integer |
stats :job_create_outcomes :job_count | The number of job creation attempts that resulted in that particular status code. | Integer |
stats :job_create_outcomes :ratio | The ratio between the number of job creation attempts that resulted in that particular attempt and the total number of job creation attempts. | Float |
job_result_outcomes | Contains the outcome statistics of scraping/parsing jobs executed as part of the schedule. | JSON Array |
status | The job status. Possible values: pending (the job is still being processed), done (the job has been completed successfully), faulted (the job has failed). | String |
job_count | The number of jobs that resulted in that particular status . | Integer |
ratio | The ratio between the number of jobs with that particular status and the total number of jobs created. | Float |
Use this endpoint to activate or deactivate a particular schedule.
- Endpoint:
https://data.oxylabs.io/v1/schedules/{id}/state
- Method:
PUT
- Authentication:
Basic
Use this endpoint to stop or restart a schedule.
By setting
active
to false
, you can stop the execution of a particular schedule. If you set
active
to true
, you can reactivate a previously stopped schedule.{
"active": false
}
Output
null
The standard response is an empty response body with a
202
status code.Last modified 3mo ago