OxyCopilot (Beta)
Last updated
Last updated
OxyCopilot is a free Web Scraper API feature that makes onboarding easier and helps users find effective solutions for complex use cases, all without the need for coding knowledge. OxyCopilot currently includes two separate features:
Scraper builder
Custom Parser builder
OxyCopilot is accessible through the Scraper API Playground in the Oxylabs dashboard.
OxyCopilot helps users build a scraper (request code) for the Web Scraper API without needing to understand the documentation or field logic.
URL: Provide the URL that you want to scrape.
Prompt: Describe your requirements (e.g., localization, JS rendering, etc.).
Step 2: Parsing
You have three options for handling parsing:
Custom Parser: Select "Add parsing instructions" to create your own parsing logic using the Custom Parser builder.
Dedicated Parser: If the URL is from a website for which we provide a dedicated parser and you want to use it, select "Continue with Dedicated Parser".
No Parsing: Choose to proceed without parsing if structured data isn’t needed.
If the URL belongs to a website for which we have a dedicated parser, but you don’t need structured data, select "Continue with Dedicated Parser" and disable the parse parameter in the playground's settings. Avoid using the exit button, as it won’t save the pre-filled parameters.
Based on your prompt, OxyCopilot will prefill the necessary parameters in the Playground. You will see the specific request code and parameters for your use case, and you can adjust the parameters if needed.
Step 4: Submit the request and copy
If everything looks good, submit the request to see how the output looks and check if it works as expected. Then, copy the request code to use for your further scraping tasks with the Web Scraper API.
Leverage the Custom Parser feature with OxyCopilot to build a parser without needing to write code or analyze website structures manually.
URL(s): You can provide up to 3 URLs for which you want to generate parsing instructions. OxyCopilot uses the HTML of the provided URLs to determine the best logic for extracting the required fields.
The more URLs you provide, the more robust the parsing instructions will be, as OxyCopilot identifies common patterns across similar pages. Note that additional URLs may increase the wait time for results.
Prompt: The prompt is the key component in building a natural language schema, which serves as the basis for generating the actual parsing instructions. The prompt should clearly describe the fields that need to be parsed.
This step allows you to fine-tune the parsing schema to better meet your needs or troubleshoot any issues.
This table visualizes the input used by the AI to generate parsing instructions. The schema defines which fields need to be parsed and consists of various object types (explained in the table below).
Each item in the schema must have:
Name: This will be used as the object key in the parsing instructions and visible in the parsed data.
Description (optional but recommended): Helps improve parsing accuracy.
Reorder items: Drag and drop items using the dots on the left side to change their order (only items within the same nesting level can be moved).
Edit items: Click the edit icon to modify any field.
Delete items: You can delete any item at the parent level.
Add new items: Add new items to the parent level.
Once you update the schema, click the "Refresh output" button to regenerate the instructions and preview the parsed data.
String
A single text output
“title”: “Example product title”
Number
A single number
“price”: 9.99
Array of strings
A list of text outputs
“products”: [“product 1”, “product 2”, “product 3”]
Array of numbers
A list of numbers
“pages”: [1, 2, 3]
Array of objects
A list of objects/items, each having their own objects inside (_items
block in the parsing instructions)
Select "Array of objects": This option adds a child object and button.
Fill out object names: To save the item to the schema, you must fill out the names of both the parent and child objects. Once done, the checkmark will turn green.
Child object requirement: An "Array of objects" must have at least one child.
By default, parsed data is based on the first URL provided in Step 1. You can also provide a different URL to test the parsing instructions:
Instructions are generated based on the initial URLs and do not account for the test URLs. Editing the prompt or URLs will reset the schema, requiring a full regeneration.
Once the instructions are satisfactory:
Use the "Copy" button to copy the instructions and paste them into your scraper code.
Alternatively, save the instructions to your API Playground session, adjust other request parameters, test, and then copy the complete request code in your preferred programming language.
String
product_title
Product title
Number
price
Product price
Array of strings
related_products
Related product titles below the main product information
This feature is currently in Beta. We welcome your feedback and suggestions for improvement. Please don't hesitate to reach out to us at support@oxylabs.io or connect with our 24/7 live chat support.