Oxylabs Documentation
English
Search…
⌃K

Scheduler

Scheduler is a free Scraper APIs feature used to automate your recurring scraping and parsing jobs by creating schedules for them.
Check out a video tutorial below to learn more about this feature as well as to see how it works.
Step-by-step guide on automating your recurring scraping jobs using Scheduler
We advise using Scheduler together with Upload to Cloud Storage feature. This way, you can set up your schedule and receive regular data updates in your storage, without making any effort on fetching results from our system.
IMPORTANT: Scheduler is a powerful tool and can easily rack up your service bill. We advise testing it with a few job items and a limited number of repeats to ensure you're getting the correct data at the right intervals. Once that's established, you can stop the test schedule and create a new, scaled-up schedule.

Quick Start

When creating a new schedule, follow the simple steps below.
  1. 1.
    Tell us how often we should repeat the jobs by submitting a cron schedule expression;
  2. 2.
    Give us a bunch of job parameter sets that we should execute at scheduled times;
  3. 3.
    Let us know when to stop by submitting an end time.
See here to find a code example for submitting a new schedule.
HINT: Find out how to form job parameter sets by visiting the documentation of the particular Scraper API (SERP / E-Commerce / Web) you want to use.
NOTE: You can also download and import this Postman collection to try out all of our Scheduler endpoints. New to Postman? Learn more about this tool here.

Endpoints

Scheduler has several endpoints you can use to control the service:

Create a new schedule

Overview

Use this endpoint to initiate a new schedule.
  • Endpoint: https://data.oxylabs.io/v1/schedules
  • Method: POST
  • Authentication: Basic
  • Request headers: Content-Type: application/json
Input
Parameter
Description
Default Value
cron
Cron schedule expression. It determines how often the submitted schedule will run. Read more here.
-
items
List of Scraper APIs job parameter sets that should be executed as part of the schedule.
-
end_time
The time at which the schedule should stop running. NB: the end time is inclusive.
-
- required parameter
NOTE: For guidance on putting together job parameter sets for the items part of your Scheduler payload, refer to the documentation page of the particular scraper you would like to use.
The payload below will make Scheduler run two jobs schedule at 03:00 on Mondays until end_time (inclusive).
{
"cron": "0 3 * * 1",
"items": [
{"source": "universal", "url": "https://ip.oxylabs.io"},
{"source": "google_search", "query": "stuff"}
],
"end_time": "2032-12-21 12:34:45"
}

Output

The response below confirms that the schedule was created successfully.
{
"schedule_id": 4134906379157007223,
"active": true,
"items_count": 3,
"cron": "0 3 * * 1",
"end_time": "2032-12-21 12:34:45",
"next_run_at": "2022-06-06 10:15:00"
}

Get all schedules

Overview

Use this endpoint to get the list of all schedules associated with your user account.
  • Endpoint: https://data.oxylabs.io/v1/schedules
  • Method: GET
  • Authentication: Basic

Output

This endpoint returns the list of all schedule IDs that are associated with the user account that is making the request.
See the sample response below.
{
"schedules": [
1764178033254455101,
2885262175311057587,
3251365810325795747,
4134906379157007223,
4164931482277157062
]
}

Get schedule information

Overview

Use this endpoint to get information about a specific schedule.
  • Endpoint: https://data.oxylabs.io/v1/schedules/{id}
  • Method: GET
  • Authentication: Basic

Output

The payload below contains a sample schedule info response.
{
"schedule_id": 1764178033254455101,
"active": true,
"items_count": 3,
"cron": "0 3 * * 1",
"end_time": "2032-12-21 12:34:45",
"next_run_at": "2022-06-06 10:18:00",
"stats": {
"total_job_count": 3,
"job_create_outcomes": [
{
"status_code": 202,
"job_count": 3,
"ratio": 1
}
],
"job_result_outcomes": [
{
"status": "done",
"job_count": 2,
"ratio": 0.67
},
{
"status": "faulted",
"job_count": 1,
"ratio": 0.33
}
]
}
}
NOTE: At the momentitems (i.e., the jobs to be executed as part of a schedule) are missing in the output. We are aware of this and will add them shortly.
Key
Description
Type
schedule_id
The unique ID of the schedule.
Integer
active
Is the schedule active right now?
Boolean
items_count
The number of items (jobs) in the schedule.
Integer
cron
The cron expression associated with the schedule.
String
end_time
The time upon which the schedule will stop being repeated.
String
next_run_at
The time upon which the schedule will run next.
String
stats
Contains stats job creation and job completion statistics.
JSON Object
stats:total_job_count
The number of items (jobs) in the schedule.
Integer
stats:job_create_outcomes
Contains job creation statistics.
JSON Array
stats:job_create_outcomes:status_code
The status code received in response to an attempt to execute the schedule (create a scraping/parsing job).
Integer
stats:job_create_outcomes:job_count
The number of job creation attempts that resulted in that particular status code.
Integer
stats:job_create_outcomes:ratio
The ratio between the number of job creation attempts that resulted in that particular attempt and the total number of job creation attempts.
Float
job_result_outcomes
Contains the outcome statistics of scraping/parsing jobs, executed as part of the schedule.
JSON Array
status
The job status. Possible values: pending (the job is still being processed), done (the job has been completed successfully), faulted (the job has failed).
String
job_count
The number of jobs that resulted in that particular status .
Integer
ratio
The ratio between the number of jobs with that particular status and the total number of jobs created.
Float

Deactivate or reactivate a schedule

Overview

Use this endpoint to activate or deactivate a particular schedule.
  • Endpoint: https://data.oxylabs.io/v1/schedules/{id}/state
  • Method: PUT
  • Authentication: Basic

Input

Use this endpoint to stop or restart a schedule.
By setting active to false, you can stop the execution of a particular schedule.
If you set active to true, you can reactivate a schedule that was previously stopped.
{
"active": false
}
Output
null
The standard response is an empty response body with a 202 status code.

Cron Schedule Expression

Read more here and here.

API response codes

For API response codes, refer to API reference -> Response Codes -> API response codes section.