Documentation has been updated: see help center and changelog in one place.

LlamaIndex

Leverage the LlamaIndex integration with the Oxylabs Web Scraper API to easily ingest online content and build LLM-driven workflows.

The LlamaIndex integration with the Oxylabs Web Scraper API enables you to scrape and process web data through an LLM (Large Language Model) in the same workflow.

Overview

LlamaIndex is a data framework designed for building LLM applications with external data sources. Use it with Oxylabs Web Scraper API to:

  • Scrape structured data without handling CAPTCHAs, IP blocks, or JS rendering

  • Process results with an LLM in the same pipeline

  • Build end-to-end workflows from extraction to AI-powered output

Getting started

Create your API user credentials: sign up for a free trial or purchase the product in the Oxylabs dashboard to create your API user credentials (USERNAME and PASSWORD).

If you need more than one API user for your account, please contact our customer support or message our 24/7 live chat support.

Environment setup

In this guide we will use Python programming language. Install the required libraries using pip:

pip install -qU llama-index llama-index-readers-oxylabs llama-index-readers-web

Create a .env file in your project directory with your Oxylabs Web Scraper API credentials and OpenAI API key:

OXYLABS_USERNAME=your_API_username
OXYLABS_PASSWORD=your_API_password
OPENAI_API_KEY=your-openai-key

Load these environment variables in your Python script:

Integration methods

There are two ways to access web content via Web Scraper API in LlamaIndex:

Oxylabs Reader

The llama-index-readers-oxylabs module contains specific classes that enable you to scrape data from various sources:

API Data Source
Reader Class

Google Web Search

OxylabsGoogleSearchReader

Google Search Ads

OxylabsGoogleAdsReader

Amazon Product

OxylabsAmazonProductReader

Amazon Search

OxylabsAmazonSearchReader

Amazon Reviews

OxylabsAmazonReviewsReader

YouTube Transcript

OxylabsYoutubeTranscriptReader

For example, you can extract Google search results:

Oxylabs Web Reader

With the OxylabsWebReader class, you can extract data from any URL:

Building a basic AI search agent

Below is an example of a simple AI agent that can search Google and answer questions:

Advanced configuration

Handling dynamic content

The Web Scraper API can handle JavaScript rendering:

Setting user agent type

You can specify different user agents:

Using target-specific parameters

Many target-specific scrapers support additional parameters:

Creating vector indices

LlamaIndex is particularly useful for building vector indices from web content:

Last updated

Was this helpful?