# LangChain

The **LangChain** integration with the [**Oxylabs Web Scraper API**](https://oxylabs.io/products/scraper-api/web) enables you to collect and process web data through an LLM (Large Language Model) in the same workflow.

## Overview

**LangChain** is a framework for building apps that use LLMs alongside tools, APIs, and web data. It supports both Python and JavaScript. Use it with [**Oxylabs Web Scraper API** ](http://developers.oxylabs.io/scraper-apis/web-scraper-api?_gl=1*1ljhay3*_gcl_aw*R0NMLjE3NDYxODM0ODcuQ2owS0NRancydEhBQmhDaUFSSXNBTlp6RFdvSXlSNVg3blQtd0ZEakxHOUlvdUhyQmtoRTRCeUNwc054dFJVMmh0Z3dZTTR3Nm90SjVKOGFBbHhhRUFMd193Y0I.*_gcl_au*MjU4NDEzMTkwLjE3NDExNzU2MzI.)to:

* Scrape structured data without handling CAPTCHAs, IP blocks, or JS rendering
* Process results with an LLM in the same pipeline
* Build end-to-end workflows from extraction to AI-powered output

## Getting started

**Create your API user credentials**: sign up for a free trial or purchase the product in the [**Oxylabs dashboard**](https://dashboard.oxylabs.io/en/registration) to create your API user credentials (`USERNAME` and `PASSWORD`).

{% hint style="warning" %}
If you need more than one API user for your account, please contact our [**customer support**](mailto:support@oxylabs.io) or message our 24/7 live chat support.
{% endhint %}

In this guide we will use Python programming language. Install the required libraries using pip:

```bash
pip install -qU langchain-oxylabs langchain-openai langgraph requests python-dotenv
```

## Environment setup

Create a `.env` file in your project directory with your Oxylabs API user and OpenAI credentials:

```
OXYLABS_USERNAME=your-username
OXYLABS_PASSWORD=your-password
OPENAI_API_KEY=your-openai-key
```

Load these environment variables in your Python script:

```python
import os
from dotenv import load_dotenv

load_dotenv()
```

## Integration methods

There are two primary ways to integrate Oxylabs Web Scraper API with LangChain:

### Using langchain-oxylabs package

For Google search queries, use the dedicated [`langchain-oxylabs`](https://python.langchain.com/docs/integrations/tools/oxylabs/) package, which provides a ready-to-use integration:

```python
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langgraph.prebuilt import create_react_agent
from langchain_oxylabs import OxylabsSearchAPIWrapper, OxylabsSearchRun

load_dotenv()

# Initialize your preferred LLM model
llm = init_chat_model(
    "gpt-4o-mini",
    model_provider="openai",
    api_key=os.getenv("OPENAI_API_KEY")
)

# Initialize the Google search tool
search = OxylabsSearchRun(
    wrapper=OxylabsSearchAPIWrapper(
        oxylabs_username=os.getenv("OXYLABS_USERNAME"),
        oxylabs_password=os.getenv("OXYLABS_PASSWORD")
    )
)

# Create an agent that uses the Google search tool
agent = create_react_agent(llm, [search])

# Example usage
user_input = "When and why did the Maya civilization collapse?"
response = agent.invoke({"messages": user_input})
print(response["messages"][-1].content)
```

### Using the Web Scraper API&#x20;

For accessing other websites beyond Google search, you can directly send request to the Web Scraper API:

```python
import os
import requests
from dotenv import load_dotenv
from langchain_openai import OpenAI
from langchain_core.prompts import PromptTemplate

load_dotenv()

def scrape_website(url):
    """Scrape the website using Oxylabs Web Scraper API"""
    payload = {
        "source": "universal",
        "url": url,
        "parse": True
    }
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=(os.getenv("OXYLABS_USERNAME"), os.getenv("OXYLABS_PASSWORD")),
        json=payload
    )
    
    if response.status_code == 200:
        data = response.json()
        content = data["results"][0]["content"]
        return str(content)
    else:
        print(f"Failed to scrape website: {response.text}")
        return None

def process_content(content):
    """Process the scraped content using LangChain"""
    if not content:
        print("No content to process.")
        return None
        
    prompt = PromptTemplate.from_template(
        "Analyze the following website content and summarize key points: {content}"
    )
    chain = prompt | OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    result = chain.invoke({"content": content})
    return result

def main(url):
    print("Scraping website...")
    scraped_content = scrape_website(url)
    if scraped_content:
        print("Processing scraped content with LangChain...")
        analysis = process_content(scraped_content)
        print("\nProcessed Analysis:\n", analysis)
    else:
        print("No content scraped.")

if __name__ == "__main__":
    url = "https://sandbox.oxylabs.io/products/1"
    main(url)
```

## Target-specific scrapers

Oxylabs provides [**specialized scrapers**](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/targets) for various popular websites. Here are some examples of available sources:

| Website | Source parameter | Required parameters          |
| ------- | ---------------- | ---------------------------- |
| Google  | `google_search`  | `query`                      |
| Amazon  | `amazon_search`  | `query`, `domain` (optional) |
| Walmart | `walmart_search` | `query`                      |
| Target  | `target_search`  | `query`                      |
| Kroger  | `kroger_search`  | `query`, `store_id`          |
| Staples | `staples_search` | `query`                      |

To use a specific scraper, modify the payload in the `scrape_website` function:

```python
# Example for Amazon search
payload = {
    "source": "amazon_search",
    "query": "smartphone",
    "domain": "com",
    "parse": True
}
```

## Advanced configuration

### Handling dynamic content

The Web Scraper API can handle [**JavaScript rendering**](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/js-rendering-and-browser-control/javascript-rendering) by adding the `render` parameter:

```python
payload = {
    "source": "universal",
    "url": url,
    "parse": True,
    "render": "html"
}
```

### Setting user agent type

You can specify different [**user agents**](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/http-context-and-job-management/user-agent-type) to simulate different devices:

```python
payload = {
    "source": "universal",
    "url": url,
    "parse": True,
    "render": "html",
    "user_agent_type": "mobile"
}
```

### Using target-specific parameters

Many [**target-specific scrapers**](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/targets) support additional parameters:

```python
# Example for Kroger with location parameters
payload = {
    "source": "kroger_search",
    "query": "organic milk",
    "store_id": "01100002",
    "fulfillment_type": "pickup"
}
```

## Error handling

Implement proper error handling for production applications:

```python
try:
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=(os.getenv("OXYLABS_USERNAME"), os.getenv("OXYLABS_PASSWORD")),
        json=payload,
        timeout=60
    )
    response.raise_for_status()
    # Process response
except requests.exceptions.HTTPError as http_err:
    print(f"HTTP error occurred: {http_err}")
except requests.exceptions.ConnectionError as conn_err:
    print(f"Connection error occurred: {conn_err}")
except requests.exceptions.Timeout as timeout_err:
    print(f"Timeout error occurred: {timeout_err}")
except requests.exceptions.RequestException as req_err:
    print(f"An error occurred: {req_err}")
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.oxylabs.io/scraping-solutions/web-scraper-api/solutions-for-ai-workflows/langchain.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
