Web Scraper API

Web Scraper API is designed to collect public data from any website.

Getting started

Create your API user credentials: Sign up for a free trial or purchase the product in the Oxylabs dashboard to create your API user credentials (USERNAME and PASSWORD).

If you need more than one API user for your account, please contact our customer support or message our 24/7 live chat support.

Request sample

curl 'https://realtime.oxylabs.io/v1/queries' \
--user 'USERNAME:PASSWORD' \
-H 'Content-Type: application/json' \
-d '{
        "source": "universal",
        "url": "https://sandbox.oxylabs.io/"
    }'

We use synchronous Realtime integration method in our examples. If you would like to use Proxy Endpoint or asynchronous Push-Pull integration, refer to the integration methods section.

Output example
{
    "results": [
        {
            "content": "<!DOCTYPE html>
            CONTENT
            </html>",
            "created_at": "2024-06-26 13:51:38",
            "updated_at": "2024-06-26 13:51:39",
            "page": 1,
            "url": "https://sandbox.oxylabs.io/",
            "job_id": "7211727814179962881",
            "status_code": 200,
            "session_info": {
                "id": null,
                "expires_at": null,
                "remaining": null
            }
        }
    ]
}

Request parameter values

  1. source - This parameter sets the scraper that will be used to process your request.

  2. URL - Provide the URL of the target you want to scrape, for example:

    1. Real Estate: Idealista, Redfin, Zillow, Zoopla

    2. Travel: Airbnb, Agoda, Booking, TripAdvisor

    3. Company data: Crunchbase, ZoomInfo, AngelList, Product Hunt

    4. Entertainment: Netflix, SoundCloud, YouTube, IMDb

    5. Automotive: AutoEurope, Autotrader, RockAuto, Halfords

    6. Any other.

- mandatory parameter

Additional parameters

These are the parameters of our features.

ParameterDescriptionDefault Value

geo_location

Geo location of proxy used to retrieve the data. The complete list of the supported locations can be found here.

-

render

html will enable JavaScript rendering. More info. NOTE: If you are observing low success rates or retrieve empty content, please try adding this parameter.

-

browser_instructions

Define your own browser instructions that are executed when rendering JavaScript. More info.

-

parse

true will return structured data as long as you define parsing_instructions

false

parsing_instructions

Define your own parsing and data transformation logic that will be executed on an HTML scraping result. Read more: Parsing instructions examples.

-

context: headers

Pass your own headers. Learn more here.

-

context: cookies

Pass your own cookies. Learn more here.

-

context: session_id

If you want to use the same proxy with multiple requests, you can do so by using this parameter. Just set your session to any string you like, and we will assign a proxy to this ID, and keep it for up to 10 minutes. After that, if you make another request with the same session ID, a new proxy will be assigned to that particular session ID.

-

user_agent_type

Device type and browser. The full list can be found here.

desktop

context: http_method

Set it to post if you would like to make a POST request to your target URL. Learn more here.

get

context: content

Base64-encoded POST request body. It is only useful if http_method is set to post. Learn more here.

-

content_encoding

Set to base64 if you want to download images. Learn more here.

-

context: follow_redirects

Set to true to enable scraper to follow redirects. By default, redirects are followed up to a limit of 10 links, treating the entire chain as one scraping job.

-

context: successful_status_codes

Define a custom HTTP response code (or a few of them), upon which we should consider the scrape successful and return the content to you. May be useful if you want us to return the 503 error page or in some other non-standard cases.

-

Request sample

In this sample, we include some parameters from the table above. Note that while these parameters are not always necessary or compatible within the same request, they illustrate how to format your requests.

{
    "source": "universal", 
    "url": "https://example.com", 
    "user_agent_type": "desktop",
    "geo_location": "United States",
    "context": [
        {
            "key": "headers", 
            "value": {
                "Content-Type": "application/octet-stream", 
                "Custom-Header-Name": "custom header content"
            }
        }, 
        {
            "key": "cookies", 
            "value": [
                {
                    "key": "NID", 
                    "value": "1234567890"
                },
                {
                    "key": "1P JAR",
                    "value": "0987654321"
                }
            ]
        },
        {
            "key": "follow_redirects",
            "value": true
        },
        {
            "key": "http_method", 
            "value": "post"
        },
        {
            "key": "content",
            "value": "base64EncodedPOSTBody"
        },
        {
            "key": "successful_status_codes",
            "value": [204, 808, 909]
        }
    ],
    "parse": true,
    "parsing_instructions": {
        "title": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": ["//h1/text()"]
                }
            ]
        }
    }
}
Output example
{
    "results": [
        {
            "content": {
                "title": "Example Domain",
                "parse_status_code": 12000
            },
            "created_at": "2024-06-26 13:24:40",
            "updated_at": "2024-06-26 13:24:42",
            "page": 1,
            "url": "https://example.com/",
            "job_id": "7211721028764980225",
            "status_code": 200,
            "parser_type": "custom",
            "session_info": {
                "id": null,
                "expires_at": null,
                "remaining": null
            }
        }
    ]
}

If you need any assistance in making your first request, feel free to contact us via the 24/7 available live chat.

Testing via Scraper APIs Playground

Login to Oxyabs dashboard and try Web Scraper API in the Scraper APIs Playground.

Testing via Postman

Get started with our API using Postman, a handy tool for making HTTP requests. Download our Web Scraper API Postman collection and import it. This collection includes examples that demonstrate the functionality of the scraper. Customize the examples to your needs or start scraping right away.

For step-by-step instructions, watch our video tutorial below. If you're new to Postman, check out this short guide.

All information herein is provided on an “as is” basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on this page. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website’s terms of service or receive a scraping license.

Last updated