Scheduler
Last updated
Was this helpful?
Last updated
Was this helpful?
Scheduler is a free feature of the Scraper APIs that lets you automate recurring scraping and parsing jobs by creating schedules.
Check out the video tutorial below to learn more about Scheduler and how it works.
We advise using Scheduler together with the Upload to Cloud Storage feature. This way, you can set up your schedule and receive regular data updates in your storage without trying to fetch results from our system.
IMPORTANT: Scheduler is a powerful tool that can quickly raise your service bill. We advise testing it with a few job items and a limited number of repeats to ensure you get the correct data at the right intervals. Once that's established, you can stop the test schedule and create a new, scaled-up schedule.
When creating a new schedule, follow the simple steps below.
Tell us how often we should repeat the jobs by submitting a cron schedule expression;
Give us a bunch of job parameter sets that we should execute at scheduled times;
Let us know when to stop by submitting an end time.
See here to find a code example for submitting a new schedule.
NOTE: You can also download and import this Postman collection to try out all our Scheduler endpoints. New to Postman? Learn more about this tool here.
Scheduler has several endpoints you can use to control the service:
Use this endpoint to initiate a new schedule.
Endpoint: https://data.oxylabs.io/v1/schedules
Method: POST
Authentication: Basic
Request headers: Content-Type: application/json
Input
cron
-
items
List of Scraper APIs job parameter sets that should be executed as part of the schedule.
-
end_time
The time at which the schedule should stop running. NB: the end time is inclusive.
-
- required parameter
The payload below will make Scheduler run two jobs schedule at 03:00 on Mondays until end_time
(inclusive).
The response below confirms that the schedule was created successfully.
Use this endpoint to get the list of all schedules associated with your user account.
Endpoint: https://data.oxylabs.io/v1/schedules
Method: GET
Authentication: Basic
This endpoint returns the list of all schedule IDs associated with the user account making the request.
See the sample response below.
Use this endpoint to get information about a list of all runs in a schedule with the metadata of each job and each run’s success rate.
Endpoint: https://data.oxylabs.io/v1/schedules/{id}/runs
Method: GET
Authentication: Basic
The payload below contains a sample /runs
endpoint response.
runs
A collection of run objects that represent execution instances of a scheduled task or workflow.
Array
runs
:run_id
A unique identifier for the specific run instance.
Integer
runs
:jobs
A collection of job objects that were executed as part of this run.
Array
runs
:success_rate
The ratio of successful jobs to total jobs in this run (ranges from 0 to 1).
Number
runs
:jobs
:id
A unique Oxylabs identifier for the specific job.
Integer
runs
:jobs
:create_status_code
HTTP status code returned when the job was created, indicating the initial acceptance of the job request.
Integer
runs
:jobs
:result_status
The execution status of the job (e.g., "done", "failed", "pending").
String
runs
:jobs
:created_at
Timestamp when the job was created
String
runs
:jobs
:result_created_at
Timestamp when the job completed and produced a result
String
Use this endpoint to get the list of scraping jobs executed as a result of running a schedule.
Endpoint: https://data.oxylabs.io/v1/schedules/{id}/jobs
Method: GET
Authentication: Basic
The payload below contains a sample schedule info response.
Use this endpoint to get information about a specific schedule.
Endpoint: https://data.oxylabs.io/v1/schedules/{id}
Method: GET
Authentication: Basic
The payload below contains a sample schedule info response.
schedule_id
The unique ID of the schedule.
Integer
active
Is the schedule active right now?
Boolean
items_count
The number of items (jobs) in the schedule.
Integer
cron
The cron expression associated with the schedule.
String
end_time
The time upon which the schedule will stop being repeated.
String
next_run_at
The time upon which the schedule will run next.
String
links
A collection of link objects that define available API endpoints related to a schedule resource.
Array
links
:rel
The relationship identifier that explains the purpose of the link relative to the parent resource.
String
links
:href
The URL path to the API endpoint. Represents the resource location that can be accessed.
String
links
:method
The HTTP method to be used when accessing this endpoint.
String
stats
Contains stats job creation and job completion statistics.
JSON Object
stats
:total_job_count
The number of items (jobs) in the schedule.
Integer
stats
:job_create_outcomes
Contains job creation statistics.
JSON Array
stats
:job_create_outcomes
:status_code
The status code received in response to an attempt to execute the schedule (create a scraping/parsing job).
Integer
stats
:job_create_outcomes
:job_count
The number of job creation attempts that resulted in that particular status code.
Integer
stats
:job_create_outcomes
:ratio
The ratio between the number of job creation attempts that resulted in that particular attempt and the total number of job creation attempts.
Float
job_result_outcomes
Contains the outcome statistics of scraping/parsing jobs executed as part of the schedule.
JSON Array
status
The job status. Possible values: pending
(the job is still being processed), done
(the job has been completed successfully), faulted
(the job has failed).
String
job_count
The number of jobs that resulted in that particular status
.
Integer
ratio
The ratio between the number of jobs with that particular status and the total number of jobs created.
Float
Use this endpoint to activate or deactivate a particular schedule.
Endpoint: https://data.oxylabs.io/v1/schedules/{id}/state
Method: PUT
Authentication: Basic
Use this endpoint to stop or restart a schedule.
By setting active
to false
, you can stop the execution of a particular schedule.
If you set active
to true
, you can reactivate a previously stopped schedule.
Output
The standard response is an empty response body with a 202
status code.
For API response codes, refer to API section.
Cron schedule expression. It determines how often the submitted schedule will run. Read more and .