Oxylabs Documentation
English
Search
⌃K

Octoparse

Octoparse is a data extraction tool. It allows you to scrape public data without coding and bypass most anti-scraping mechanisms by enabling automatic IP rotation and extended session time.
To integrate Octoparse with Oxylabs Dedicated Datacenter Proxies, follow the simple steps below or watch this video tutorial:
Step 1. Download, install and open Octoparse.
Step 2. Create a new task by clicking the +New button in the top-left corner and choosing Custom Task.
Custom Task on Octoparse
Step 3. Type the URL of the webpage you intend to extract data from in the URL Input field and click the Save button. We'll use books.toscrape.com as an example.
URL input example
Step 4. After your selected URL loads, click the top-right Settings button.
Settings
Step 5. Scroll down to the Anti-blocking Settings.
Step 6. Put a checkmark in the Access websites via proxies box. After this step, you will see Use my own proxies and the Configure button.
Configure proxies
Step 7. A pop-up window will appear when you click the Octoparse Configure button. Octoparse only works with IP:PORT-based format. For Dedicated Datacenter proxies, enter these details:
IP: a specific IP address (123.123.123.123- example)
In the case of Dedicated Datacenter Proxies, you will need to choose an IP address from the purchased list. Please refer to our documentation for more details.
If you’re using a username:password authentication method:
Port: 6000
If you’re using whitelisted IPs:
Port: 65432
Look at the example below.
Proxy Settings
Step 8. Set up the Switch interval depending on whether you use a rotating or sticky session type.
Switch interval
Step 9. Save changes by clicking the Confirm button.
Step 10. To ensure the Octoparse integration was successful, check if there is a checkmark next to the Configure button in the Anti-blocking settings section.
Configure
Step 11. Click the Save button, and it'll bring you to the main screen of the page you’re scraping.
Save
You've successfully set up Oxylabs' proxies with Octoparse.
Below are some additional steps on how to start scraping:
Step 12. Click on the lightbulb, which will expand and give you choices on whether to paginate or add a page scroll.
Step 13. After you’ve made your choice, click on the Create Workflow button.
Create Workflow
Step 14. This will allow you to select a page element you’d like to extract from. In our case, we’ll choose Mystery. Click on it and select Extract text of the selected element.
Extract page element
Step 15. Afterward, you’ll be presented with the pop-up below. At the top-right, click Save and then Run.
Save and Run
Step 16. A pop-up will appear with multiple choices. Choose whichever is most relevant for you (some are paid options) and continue. For example, we’ll pick Run on your device and Standard mode.
Run your task
Step 17. A new page will open where the scraping process will begin. You can pause and resume it whenever you want.
Scraping process
Step 18. Since this is merely an example, we’ll stop here. Confirm to stop the run.
Stop the run
Step 19. Here, some statistics will be shown for your scraping task. You can choose to export data later or now.
Statistics
Step 20. If you select Export Data, The last pop-up will appear, allowing you to choose a format for extracting the data.
Export data
Step 21. Pick which one is relevant for you.
That’s it – you are set up and ready to focus on your web scraping tasks with Octoparse.
Find the original Octoparse integration blog post here.