The LangChain integration with the Oxylabs Web Scraper API enables you to collect and process web data through an LLM (Large Language Model) in the same workflow.
Overview
LangChain is a framework for building apps that use LLMs alongside tools, APIs, and web data. It supports both Python and JavaScript. Use it with Oxylabs Web Scraper APIto:
Scrape structured data without handling CAPTCHAs, IP blocks, or JS rendering
Process results with an LLM in the same pipeline
Build end-to-end workflows from extraction to AI-powered output
Getting started
Create your API user credentials: sign up for a free trial or purchase the product in the Oxylabs dashboard to create your API user credentials (USERNAME and PASSWORD).
If you need more than one API user for your account, please contact ourcustomer support or message our 24/7 live chat support.
In this guide we will use Python programming language. Install the required libraries using pip:
Load these environment variables in your Python script:
import os
from dotenv import load_dotenv
load_dotenv()
Integration methods
There are two primary ways to integrate Oxylabs Web Scraper API with LangChain:
Using langchain-oxylabs package
For Google search queries, use the dedicated langchain-oxylabs package, which provides a ready-to-use integration:
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langgraph.prebuilt import create_react_agent
from langchain_oxylabs import OxylabsSearchAPIWrapper, OxylabsSearchRun
load_dotenv()
# Initialize your preferred LLM model
llm = init_chat_model(
"gpt-4o-mini",
model_provider="openai",
api_key=os.getenv("OPENAI_API_KEY")
)
# Initialize the Google search tool
search = OxylabsSearchRun(
wrapper=OxylabsSearchAPIWrapper(
oxylabs_username=os.getenv("OXYLABS_USERNAME"),
oxylabs_password=os.getenv("OXYLABS_PASSWORD")
)
)
# Create an agent that uses the Google search tool
agent = create_react_agent(llm, [search])
# Example usage
user_input = "When and why did the Maya civilization collapse?"
response = agent.invoke({"messages": user_input})
print(response["messages"][-1].content)
Using the Web Scraper API
For accessing other websites beyond Google search, you can directly send request to the Web Scraper API:
import os
import requests
from dotenv import load_dotenv
from langchain_openai import OpenAI
from langchain_core.prompts import PromptTemplate
load_dotenv()
def scrape_website(url):
"""Scrape the website using Oxylabs Web Scraper API"""
payload = {
"source": "universal",
"url": url,
"parse": True
}
response = requests.post(
"https://realtime.oxylabs.io/v1/queries",
auth=(os.getenv("OXYLABS_USERNAME"), os.getenv("OXYLABS_PASSWORD")),
json=payload
)
if response.status_code == 200:
data = response.json()
content = data["results"][0]["content"]
return str(content)
else:
print(f"Failed to scrape website: {response.text}")
return None
def process_content(content):
"""Process the scraped content using LangChain"""
if not content:
print("No content to process.")
return None
prompt = PromptTemplate.from_template(
"Analyze the following website content and summarize key points: {content}"
)
chain = prompt | OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
result = chain.invoke({"content": content})
return result
def main(url):
print("Scraping website...")
scraped_content = scrape_website(url)
if scraped_content:
print("Processing scraped content with LangChain...")
analysis = process_content(scraped_content)
print("\nProcessed Analysis:\n", analysis)
else:
print("No content scraped.")
if __name__ == "__main__":
url = "https://sandbox.oxylabs.io/products/1"
main(url)
Target-specific scrapers
Oxylabs provides specialized scrapersfor various popular websites. Here are some examples of available sources:
Website
Source parameter
Required parameters
Google
google_search
query
Amazon
amazon_search
query, domain (optional)
Walmart
walmart_search
query
Target
target_search
query
Kroger
kroger_search
query, store_id
Staples
staples_search
query
To use a specific scraper, modify the payload in the scrape_website function:
# Example for Amazon search
payload = {
"source": "amazon_search",
"query": "smartphone",
"domain": "com",
"parse": True
}
Advanced configuration
Handling dynamic content
The Web Scraper API can handleJavaScript rendering by adding the render parameter: