> For the complete documentation index, see [llms.txt](https://developers.oxylabs.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://developers.oxylabs.io/products/cn/web-scraper-api/solutions-for-ai-workflows/langchain.md).

# LangChain

该 **LangChain** 与以下内容的集成 [**Oxylabs 网页爬虫API**](https://oxylabs.io/products/scraper-api/web) 使您能够在同一工作流中通过 LLM（大型语言模型）收集和处理网页数据。

## 概述

**LangChain** 是一个用于构建应用的框架，可让 LLM 与工具、API 和网页数据结合使用。它同时支持 Python 和 JavaScript。将它与 [**Oxylabs 网页爬虫API** ](http://developers.oxylabs.io/scraper-apis/web-scraper-api?_gl=1*1ljhay3*_gcl_aw*R0NMLjE3NDYxODM0ODcuQ2owS0NRancydEhBQmhDaUFSSXNBTlp6RFdvSXlSNVg3blQtd0ZEakxHOUlvdUhyQmtoRTRCeUNwc054dFJVMmh0Z3dZTTR3Nm90SjVKOGFBbHhhRUFMd193Y0I.*_gcl_au*MjU4NDEzMTkwLjE3NDExNzU2MzI.)一起使用，以：

* 抓取结构化数据，而无需处理 CAPTCHA、IP 封锁或 JS 渲染
* 在同一管道中使用 LLM 处理结果
* 构建从提取到 AI 驱动输出的端到端工作流

## 开始使用

**创建你的 API 用户凭证**一起使用：注册免费试用，或在以下位置购买产品 [**Oxylabs 仪表板**](https://dashboard.oxylabs.io/en/registration) 中购买产品，以创建你的 API 用户凭证（`USERNAME` 和 `PASSWORD`).

{% hint style="warning" %}
如果你的账户需要多个 API 用户，请联系我们的 [**客户支持**](mailto:support@oxylabs.io) 或向我们的 24/7 在线聊天支持发送消息。
{% endhint %}

在本指南中，我们将使用 Python 编程语言。使用 pip 安装所需库：

```bash
pip install -qU langchain-oxylabs langchain-openai langgraph requests python-dotenv
```

## 环境设置

创建一个 `.env` 在您的项目目录中创建 \`file\`，并填入您的 Oxylabs API 用户名和 OpenAI 凭据：

```
OXYLABS_USERNAME=your-username
OXYLABS_PASSWORD=your-password
OPENAI_API_KEY=your-openai-key
```

在您的 Python 脚本中加载这些环境变量：

```python
import os
from dotenv import load_dotenv

load_dotenv()
```

## 集成方法

将 Oxylabs 网页爬虫API 与 LangChain 集成主要有两种方式：

### 使用 langchain-oxylabs 包

对于 Google 搜索查询，请使用专用的 [`langchain-oxylabs`](https://python.langchain.com/docs/integrations/tools/oxylabs/) 包，它提供了可直接使用的集成：

```python
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langgraph.prebuilt import create_react_agent
from langchain_oxylabs import OxylabsSearchAPIWrapper, OxylabsSearchRun

load_dotenv()

# Initialize your preferred LLM model
llm = init_chat_model(
    "gpt-4o-mini",
    model_provider="openai",
    api_key=os.getenv("OPENAI_API_KEY")
)

# Initialize the Google search tool
search = OxylabsSearchRun(
    wrapper=OxylabsSearchAPIWrapper(
        oxylabs_username=os.getenv("OXYLABS_USERNAME"),
        oxylabs_password=os.getenv("OXYLABS_PASSWORD")
    )
)

# Create an agent that uses the Google search tool
agent = create_react_agent(llm, [search])

# Example usage
user_input = "When and why did the Maya civilization collapse?"
response = agent.invoke({"messages": user_input})
print(response["messages"][-1].content)
```

### 使用网页爬虫API&#x20;

如需访问 Google 搜索之外的其他网站，您可以直接向网页爬虫API发送请求：

```python
import os
import requests
from dotenv import load_dotenv
from langchain_openai import OpenAI
from langchain_core.prompts import PromptTemplate

load_dotenv()

def scrape_website(url):
    """使用 Oxylabs 网页爬虫API 抓取网站"""
    payload = {
        "source": "universal",
        "url": url,
        "parse": True
    }
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=(os.getenv("OXYLABS_USERNAME"), os.getenv("OXYLABS_PASSWORD")),
        json=payload
    )
    
    if response.status_code == 200:
        data = response.json()
        content = data["results"][0]["content"]
        return str(content)
    else:
        print(f"Failed to scrape website: {response.text}")
        return None

def process_content(content):
    """使用 LangChain 处理抓取的内容"""
    if not content:
        print("No content to process.")
        return None
        
    prompt = PromptTemplate.from_template(
        "Analyze the following website content and summarize key points: {content}"
    )
    chain = prompt | OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    result = chain.invoke({"content": content})
    return result

def main(url):
    print("Scraping website...")
    scraped_content = scrape_website(url)
    if scraped_content:
        print("Processing scraped content with LangChain...")
        analysis = process_content(scraped_content)
        print("\nProcessed Analysis:\n", analysis)
    else:
        print("No content scraped.")

if __name__ == "__main__":
    url = "https://sandbox.oxylabs.io/products/1"
    main(url)
```

## 特定目标抓取器

Oxylabs 提供 [**专用抓取器**](/api-targets/cn/ren-yi-yu-ming.md) 适用于各种常见网站。以下是一些可用源的示例：

| 网站      | 源参数              | 必需参数                |
| ------- | ---------------- | ------------------- |
| Google  | `google_search`  | `query`             |
| Amazon  | `amazon_search`  | `query`, `域名` （可选）  |
| Walmart | `walmart_search` | `query`             |
| 目标      | `target_search`  | `query`             |
| Kroger  | `kroger_search`  | `query`, `store_id` |
| Staples | `staples_search` | `query`             |

要使用特定抓取器，请修改 `scrape_website` 函数中的 \`payload\`：

```python
# Example for Amazon search
payload = {
    "source": "amazon_search",
    "query": "smartphone",
    "domain": "com",
    "parse": True
}
```

## 高级配置

### 处理动态内容

网页爬虫API可以处理 [**JavaScript 渲染**](/products/cn/web-scraper-api/features/js-rendering-and-browser-control.md) 方法是添加 `render` 参数：

```python
payload = {
    "source": "universal",
    "url": url,
    "parse": True,
    "render": "html"
}
```

### 设置 user agent 类型

您可以指定不同的 [**用户代理**](/products/cn/web-scraper-api/features/http-context-and-job-management/user-agent-type.md) 以模拟不同设备：

```python
payload = {
    "source": "universal",
    "url": url,
    "parse": True,
    "render": "html",
    "user_agent_type": "mobile"
}
```

### 使用目标特定参数

许多 [**特定目标抓取器**](https://developers.oxylabs.io/api-targets/cn/) 支持附加参数：

```python
# Example for Kroger with location parameters
payload = {
    "source": "kroger_search",
    "query": "organic milk",
    "store_id": "01100002",
    "fulfillment_type": "pickup"
}
```

## 错误处理

为生产应用实现适当的错误处理：

```python
try:
    response = requests.post(
        "https://realtime.oxylabs.io/v1/queries",
        auth=(os.getenv("OXYLABS_USERNAME"), os.getenv("OXYLABS_PASSWORD")),
        json=payload,
        timeout=60
    )
    response.raise_for_status()
    # Process response
except requests.exceptions.HTTPError as http_err:
    print(f"HTTP error occurred: {http_err}")
except requests.exceptions.ConnectionError as conn_err:
    print(f"Connection error occurred: {conn_err}")
except requests.exceptions.Timeout as timeout_err:
    print(f"Timeout error occurred: {timeout_err}")
except requests.exceptions.RequestException as req_err:
    print(f"An error occurred: {req_err}")
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://developers.oxylabs.io/products/cn/web-scraper-api/solutions-for-ai-workflows/langchain.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
