circle-check
Documentation has been updated: see help center and changelog in one place.

Cloud Storage

Retrieve your scraped results directly in your S3, GCS, OSS, or other S3-compatible storage.

Scraper API job results are stored in our storage. You can get your results from our storage by GETting the /results endpoint.

As an alternative, we can upload the results to your cloud storage. This way, you don't have to make extra requests to fetch results – everything goes directly to your storage bucket.

circle-info

Cloud storage integration works only with Push-Pull integration method.

Currently, we support these cloud storage services:

If you'd like to use a different type of storage, please contact your account manager to discuss the feature delivery timeline.

The upload path looks like this: YOUR_BUCKET_NAME/job_ID.json. You'll find the job ID in the response that you receive from us after submitting a job.

Input

Parameter
Description
Valid values

storage_type

Your cloud storage type.

gcs (Google Cloud Storage);

s3 (AWS S3); tos (BytePlus TOS);

s3_compatible (any S3-compatible storage).

storage_url

Your cloud storage bucket name / URL.

  • Any s3 , gcs , or tos bucket name;

  • Any s3-compatible storage URL.

Google Cloud Storage

The payload below makes Web Scraper API scrape https://example.com and put the result on a Google Cloud Storage bucket.

To get your job results uploaded to your Google Cloud Storage bucket, please set up special permissions for our service as shown below:

1

Create a custom role

2

Add storage.objects.create permission

3

Assign it to Oxylabs

In the New members field, enter the following Oxylabs service account email:

Amazon S3

The payload below makes Web Scraper API scrape https://example.com and put the result on an Amazon S3 bucket.

To get your job results uploaded to your Amazon S3 bucket, please set up access permissions for our service. To do that, go to https://s3.console.aws.amazon.com/arrow-up-rightS3StorageBucket Name (if you don't have one, create a new one)PermissionsBucket Policy.

You can find the bucket policy attached below or in the code sample area.

s3 bucket policy

Don't forget to change the bucket name under YOUR_BUCKET_NAME. This policy allows us to write to your bucket, give access to uploaded files to you, and know the location of the bucket.

Alibaba Cloud Object Storage Service (OSS)

The payload below makes Web Scraper API scrape https://example.com and put the result on an Alibaba Cloud OSS bucket.

Forming the Storage URL

Storage URL format:

circle-exclamation

Here’s where you'll find the BUCKET_NAME and oss-REGION of your bucket:

Creating the Access Key and Secret

In order to use the S3-compatible interface with Alibaba OSS, you must create the ACCESS_KEY_ID and ACCESS_KEY_SECRET as shown below. For more information, see How to use Amazon S3 SDKs to access OSSarrow-up-right.

1

Go to the AccessKey Account Menu

2

Log on to the RAM console

Access the RAM consolearrow-up-right by using an Alibaba Cloud account or a RAM user who has administrative rights.

3

Go to Identities Users in the left-side navigation pane

4

Select Create User and use the RAM User AccessKey:

5

Grant permissions to the RAM user

The newly created RAM user has no permissions. You must grant AliyunOSSFullAccess permissions to the RAM user. Then, the RAM user can access the required Alibaba Cloud resources. For more information, see Grant permissions to RAM usersarrow-up-right.

6

Get your AccessKey ID and AccessKey Secret

When permissions are granted, return to the Authentication section and, in the Access Key section, select Create AccessKey. Choose to create an Access Key for a Third-Party service. You'll then see an ACCESS_KEY_ID and ACCESS_KEY_SECRET, which you can then use in your requests.

Alibaba OSS Rate limits

When doing concurrent uploads to Alibaba OSS, it's possible to hit their account/bucket rate limits, and the uploads will start timing out with the following error:

In this case, please contact Alibaba OSS support to increase your OSS rate limits.

BytePlus TOS

You can upload scraped results directly to a BytePlus Torch Object Storage (TOS) bucket. Please note you you must have your bucked set up correctly and have both you access key and secret key available for cloud storage access.

The example payload below makes Web Scraper API scrape https://example.com and put the result on an BytePlus TOS bucket.

Parameters

Parameter
Value
Description

storage_type

tos

Specifies BytePlus TOS as the storage provider.

storage_url

String (URL)

Authenticated URL to your TOS bucket (see format below).

Storage URL Format

The storage_url must be constructed using your TOS credentials and bucket details.

Component
Description

access_key

Your BytePlus access key ID.

secret_key

Your BytePlus secret access key.

endpoint

The region-specific endpoint (e.g., tos-cn-hongkong.bytepluses.com).

bucket_name

Destination bucket name.

path

(Optional) Bucket's specific folder path.

circle-exclamation

Output File Naming

Oxylabs automatically generates filenames for the uploaded objects based on the job details:

  • HTML/Content: {query_id}_{timestamp}.html

  • Parsed Data: {query_id}_results.json

Files will be accessible in your bucket at: tos://{bucket_name}/{path}/{filename}

Other S3-compatible storage

If you'd like to get your results delivered to an S3-compatible storage location, you'll have to include your bucket's ACCESS_KEY:SECRET auth string in the storage_url value in the payload:

Last updated

Was this helpful?