Octoparse
Last updated
Last updated
Octoparse is a data extraction tool. It allows you to scrape public data without coding and bypass most anti-scraping mechanisms by enabling automatic IP rotation and extended session time.
To integrate Octoparse with Oxylabs Datacenter Proxies, follow the simple steps below:
Step 1. Download, install, and then open Octoparse.
Step 2. Create a new task by clicking the +New button in the top-left corner, and choosing Custom Task.
Step 3. Type the URL of the webpage you intend to extract data from in the URL Input and click the Save button. We'll use Oxylabs scraping sandbox as an example.
Step 4. After your selected URL loads, go to Task Settings > Anti-blocking.
Step 5. Now, check Access websites via proxies, enable Use my own proxies, and click Configure.
Step 6. When you click on the Configure button, a pop-up window will appear. Specify proxy details in the following format: IP/host:port:user-username:password
.
For Datacenter Proxies, you can use:
IP/host: dc.oxylabs.io
Port: 8001
For the Pay-per-IP subscription, the port corresponds to the sequential number assigned to an IP address from the provided list. Hence, port 8001
uses the first IP address on your list.
For the Pay-per-traffic subscription, port 8001
randomly selects an IP address but remains consistent throughout a session.
Username: user-username
(your proxy user’s username)
Password: password
(your proxy user’s password)
Don't forget to add the user-
part to the username. You can also specify geo-location, such as the US, in the user authentication string: user-USERNAME-country-US:PASSWORD
. For more details, see our documentation.
Please note that the screenshots provided in this guide depict the setup process using Residential Proxies for illustrative purposes, refer to the specific guidelines for Datacenter Proxies provided in the text.
Step 7. Set up the Switch interval depending on whether you use a rotating or sticky session type.
Step 8. Save changes by clicking the Confirm button and after that, click Save.
Proxies are now set up.
Step 1. Select the desirable elements (video game titles) you want to scrape. To extract all elements from the same category, choose Select all similar elements and specify Text.
Step 2. Set up pagination to scrape multiple pages. This particular website uses numbered pages, prompting you to choose Next page button.
Step 3. Choose the exact button in the page layout that opens the following page – Forward – to automate pagination.
Step 4. Complete the scraping setup and press ▶Run.
Step 5. Choose Run on your device with Standard Mode to receive data as a file on your PC.
Step 6. Let the scraping process run until complete. The process will be over when the final product page is reached or when you stop it manually.
Step 7. Extract the collected data and select the file format.
Here's the final result in a spreadsheet.
That’s it – you are all set up and ready to focus on your web scraping tasks with Octoparse.