Other Domains
This data type is universal and can be applied to any domain. It accepts URLs along with additional parameters. You can find the list of available parameters in the table below.
Below is a quick overview of all the available data
source
values we support with other domains.Source | Description | Structured data |
---|---|---|
universal | Submit any URL you like. |
Parameter | Description | Default Value |
---|---|---|
source | universal | |
url | Direct URL (link) to Universal page | N/A |
user_agent_type | desktop | |
geo_location | Geo location of proxy used to retrieve the data. The complete list of the supported locations can be found here. | N/A |
locale | Locale, as expected in the Accept-Language header. | N/A |
render | N/A | |
content_encoding | base64 | |
context :
content | Base64-encoded POST request body. It is only useful if http_method is set to post . | N/A |
context :
cookies | Pass your own cookies. | N/A |
context :
follow_redirects | Indicate whether you would like the scraper to follow redirects (3xx responses with a destination URL) to get the contents of the URL at the end of the redirect chain. | N/A |
context :
headers | Pass your own headers. | N/A |
context :
http_method | Set it to post if you would like to make a POST request to your target URL. | get |
context :
session_id | If you want to use the same proxy with multiple requests, you can do so by using this parameter. Just set your session to any string you like, and we will assign a proxy to this ID, and keep it for up to 10 minutes. After that, if you make another request with the same session ID, a new proxy will be assigned to that particular session ID. | N/A |
context :
successful_status_codes | Define a custom HTTP response code (or a few of them), upon which we should consider the scrape successful and return the content to you. May be useful if you want us to return the 503 error page or in some other non-standard cases. | N/A |
callback_url | N/A | |
parse | true will return structured data as long as you define parsing_instructions | false |
parsing_instructions | Define your own parsing and data transformation logic that will be executed on an HTML scraping result. Read more: Parsing instructions examples. | N/A |
- required parameter
If you are observing low success rates or retrieve empty content, please try using additional
"render": "html"
parameter in your request. More info about render parameter can be found here.Simple request
In this example, we make a simple request to the Web Scraper API by including only the URL to be scraped. In a lot of cases this will be enough to yield a good result.
JSON
cURL
Python
PHP
HTTP
{
"source": "universal",
"url": "https://sandbox.oxylabs.io/"
}
curl --user "user:pass1" \
'https://realtime.oxylabs.io/v1/queries' \
-H "Content-Type: application/json" \
-d '{"source": "universal", "url": "https://sandbox.oxylabs.io/"}'
from pprint import pprint
import requests
# Structure payload.
payload = {
'source': 'universal',
'url': 'https://sandbox.oxylabs.io/
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Instead of response with job status and results url, this will return the
# JSON response with the result.
pprint(response.json())
<?php
$params = [
'source' => 'universal',
'url' => 'https://sandbox.oxylabs.io/
];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://data.oxylabs.io/v1/queries");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($params));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERPWD, "user:pass1");
$headers = [];
$headers[] = "Content-Type: application/json";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
echo $result;
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close($ch);
# The whole string you submit has to be URL-encoded.
https://realtime.oxylabs.io/v1/queries?source=universal&url=https%3A%2F%2Fsandbox.oxylabs.io%2F
The example above uses the Realtime integration method. If you would like to use some other integration method in your query (e.g. Push-Pull or Proxy Endpoint), refer to the integration methods section.
You can always write your own parsing instructions with Custom Parser feature and get structured data.
All parameters
In this example, all available parameters are included (though not always necessary or compatible within the same request), to give you an idea on how to format your requests.
{
"source": "universal",
"url": "https://example.com",
"user_agent_type": "desktop",
"geo_location": "United States",
"context": [
{
"key": "headers",
"value": {
"Content-Type": "application/octet-stream",
"Custom-Header-Name": "custom header content"
}
},
{
"key": "cookies",
"value": [
{
"key": "NID",
"value": "1234567890"
},
{
"key": "1P JAR",
"value": "0987654321"
}]
},
{
"key": "follow_redirects",
"value": true
},
{
"key": "http_method", "value": "get"
},
{
"key": "content",
"value": "base64EncodedPOSTBody"
},
{
"key": "successful_status_codes",
"value": [303, 808, 909]
}]
}
The full list of the supported geo locations can be found below.
universal-supported-geo_location-values-web-scraper.csv
3KB
Text
Here is an example:
"United Arab Emirates",
"Albania",
"Armenia",
"Angola",
"Argentina",
"Australia",
...
"Uruguay",
"Uzbekistan",
"Venezuela Bolivarian Republic of",
"Viet Nam",
"South Africa",
"Zimbabwe"
Universal Crawler supports two HTTP(S) methods:
GET
(default) and POST
."GET",
"POST"
Using the
locale
parameter will allow you to change your target page's web interface language (not results).For example, if you use domain
com
and locale parameter de-DE
, the results will still be American, but Accept-Language
header value will be set to de-DE,de;q=0.8
. This would imitate a person from US searching in com
domain, who has the UI of his browser set to German language. If you don't use this parameter, we will set 'Accept-Language' parameter to match the domain (i.e.
en-US
for com
). [
{
"locale":{
"en-ai":{
"description":"Anguilla - English",
"domain":"com.ai"
},
"es-pr":{
"description":"Puerto Rico - Spanish",
"domain":"com.pr"
},
...
"en-by":{
"description":"Belarus - English",
"domain":"by"
},
"en-in":{
"description":"India - English",
"domain":"co.in"
}
}
}
]
Last modified 2mo ago