https://ect.oxylabs.io/v1/jobs
POST
Basic
Content-Type: application/json
https://ect.oxylabs.io/v1/jobs/{id}/stop-indexing
POST
Basic
https://ect.oxylabs.io/v1/jobs/{id}/stop-indexing
POST
Basic
https://ect.oxylabs.io/v1/jobs/{id}
GET
Basic
https://ect.oxylabs.io/v1/jobs/{id}/sitemap
GET
Basic
https://ect.oxylabs.io/v1/jobs/{id}/aggregate
GET
Basic
url
filters
filters
:crawl
filters
:max_depth
-1
scrape_params
scrape_params
:geo_location
output
:type_
upload
upload
:storage_type
s3
(for AWS S3). gcs
(for Google Cloud Storage) is coming soon.upload
:storage_url
render
html
. More info.​geo_location
geo_location
parameter value format will depend on the source
you choose. Visit your chosen source
documentation for more information. E.g. if your chosen source
is universal_ecommerce,
go to E-Commerce Scraper API -> All Domains -> Parameter Values to find the geo_location
parameter values explained.render
html
. More info.​source
source
lets you specify which scraper should be used to perform the scraping jobs while crawling. The parameter value you should use depends on the URL you are submitting. The table below outlines which source
value you should use.source
values after signing up for a free trial or making a purchase.universal
(for Sitemap or HTML output) or universal_ecommerce
(for parsed output)user_agent_type
process
and crawl
filters rely solely on regular expressions (regex) to decide whether some action should be performed on a URL (or a result associated with it). There is no shortage of online material written on regex, so we are not going to go into detail on constructing regular expressions. process
or crawl
filters by default. This means that if you don't submit any regular expressions for these filters, no crawling will take place (as we won't follow any URLs) and no results will be included in the sitemap/aggregate result.process
process
filter lets you specify which URLs should be included in the job result. Every URL we come across will be evaluated for matching the crawl
filters. If it's a match, the URL (or the contents of the URL) will be included in the job result. As a parameter value, please send one or more regular expressions in a JSON array, like this: "process": [".e", ".c", ".t"]
.crawl
Crawl
filter lets you specify which URLs (apart from the URL of the starting point) are to be scraped and checked for more URLs. In simple terms, every URL we find while crawling is evaluated for matching the crawl
filters. If it's a match, we'll scrape the URL in question to look for more URLs. As a parameter value, please send one or more regular expressions in a JSON array, like this: "indexable": [".e", ".c", ".t"]
. max_depth
max_depth
filter determines the max length of URL chains Crawler API will follow.-1
0
1
2
3
4
n
n * (all URLs found in)
starting page.type_
type_
parameter determines what the output of the Crawler job will contain. The output types break down like this:sitemap
parsed
html