web scraping

5 min read

Scraper Blocked by Amazon? IP Rotation for Scraping Can be the Answer

Matthew
Published: February 18, 2025

IP Rotation for Scraping Static Amazon Pages
IP Rotation for Scraping Dynamic Amazon Pages
Adding Robustness with Delays and Validation
Why Use a Web Scraping Service

IP rotation for scraping is one way to evade Amazon’s anti-scraping measures. It involves hiding your IP by cycling through different proxies while accessing Amazon’s website, either through HTTP requests or using automated browsers. Confused about how to proceed? Read on.

This tutorial shows you how to manage IP rotation using various Python libraries.

IP Rotation for Scraping Static Amazon Pages

The simplest way for Amazon scraping with IP rotation is to use Python’s urllib or the requests library. Here’s how you can rotate proxies with these libraries:

Using Python requests

Import requests and, from itertools, import cycle.

import requests
from itertools import cycle

The class cycle allows you to iterate through a list cyclically; that means after the last item in the list, you get the first item.

Next, store the available proxies in a list.

proxies = [

    {'http': 'http://proxy1:8080'},

    {'http': 'http://proxy2:8080'},

    {'http': 'http://proxy3:8080'}

]

Use this list to create an object of the class cycle.

proxy_pool = cycle(proxies)

The next step is to make HTTP requests. For convenience, you can create a dedicated function to make an HTTP request. This function will

Get the next IP from the list using next().
Make an HTTP request to Amazon’s URL.

# Using requests

def make_request(url):

    proxy = next(proxy_pool)
 
    response = requests.get(url, proxies=proxy, timeout=10)

    if response.status_code != 200:
        raise Exception(‘Failed’) 
    return response.text

Want to learn more about using Python requests for web scraping? Read this article on web scraping with Python requests.

Using Urllib

When using urllib, you need to create a custom opener for each proxy:

Start by importing these from Urllib:

ProxyHandler: To handle proxies

build_opener: Handles HTTP requests when using handlers like ProxyHandler

from urllib.request import ProxyHandler, build_opener

Next, define a function to accept a URL and make an HTTP request with the proxy. This function:

Gets the next proxy from the list defined previously
Creates a ProxyHandler object using the obtained proxy
Builds an opener using the proxy handler object
Tries to make an HTTP request using the opener
Raises an exception

def make_urllib_request(url):

    proxy = next(proxy_pool)

    handler = ProxyHandler(proxy)

    opener = build_opener(handler)

    response = opener.open(url, timeout=10)

    if response.status_code != 200:
        raise Exception(‘Failed’) 

    return response.read().decode(‘utf-8’)

You can now call the functions whenever you need to make a request to the URL with a new IP address. Use a loop, and in each iteration, try calling make_request():

for i in range(len(proxy_pool)*2):
    try:
        make_request(url)
        break
    except:
        continue

The number of iterations depends on the number of attempts you want to make with one proxy.

For instance, If you want to try a proxy twice, the number of iterations should be twice the number of proxies in your pool.

IP Rotation for Scraping Dynamic Amazon Pages

Suppose you want to scroll and load dynamic elements on Amazon’s page. You need to use a browser automation library, like Playwright. Here’s how you would rotate proxies in that case.

from playwright.sync_api import sync_playwright

def scrape_with_playwright():
    
    proxies = ['http://proxy1:8080', 'http://proxy2:8080']

    proxy_cycle = cycle(proxies)

    with sync_playwright() as p:
        success = False
        for i in len(proxies):
            
            proxy = next(proxy_cycle)
            browser = p.chromium.launch(

                proxy={

                    'server': proxy,
                }

            )

            

            try:

                page = browser.new_page()

                page.goto('https://amazon.com') 
                success = True               
                break
                # Your scraping logic here
            except:
                continue

            finally:

                browser.close()
        if success = False:
            print(‘all proxies failed’)

The above code cycles through proxies as before. It uses a loop, and in each iteration, it starts a Chromium instance with a proxy and tries to navigate to the target page.

If the navigation is successful, the loop breaks; otherwise, it moves on to the next proxy.

Need to know how to scrape Amazon using Playwright? Read this article on scraping Amazon product offers and sellers.

Adding Robustness with Delays and Validation

Here’s a more practical implementation that includes proxy validation and request delays. This method uses a customer class ProxyManager.

This class accepts proxies, minimum delay, and maxim delay as arguments while creating its object. It provides two methods:

validate_proxy(): Ensures that the proxy works
get_next_proxy(): Gets the next proxy from the proxy pool after ensuring that it works.

import time

import random

class ProxyManager:

    def __init__(self, proxies, min_delay=1, max_delay=5):

        self.proxies = cycle(proxies)

        self.min_delay = min_delay

        self.max_delay = max_delay

    

    def validate_proxy(self, proxy):

        try:

            response = requests.get(

                'https://httpbin.org/ip',

                proxies={'http': proxy, 'https': proxy},

                timeout=5

            )

            return response.status_code == 200

        except:

            return False

   
    def get_next_proxy(self):

        proxy = next(self.proxies)

        if self.validate_proxy(proxy):

            time.sleep(random.uniform(self.min_delay, self.max_delay))

            return proxy

        return self.get_next_proxy()  # Try next proxy

Now, you can use the class ProxyManager while web scraping Amazon. Just initialize the class with a proxy list and the values for minimum and maximum delays between requests.

proxies = ['http://proxy1:8080', 'http://proxy2:8080']

manager = ProxyManager(proxies,2,4)
# get new proxy

proxy = manager.get_next_proxy()

Important Considerations When Using IP Rotation

Each approach has its strengths. Using urllib/requests is simple and good for basic needs; however, Playwright is necessary for handling dynamic websites.

Remember to implement appropriate delays between requests and validate proxies before use to maintain a stable and respectful scraping operation. Consider using a proxy service that provides an API for rotating IPs automatically, as this can be more reliable than managing your own proxy pool.

Also, proper error handling and logging in production environments should be implemented to track proxy performance and quickly identify issues.

Want to know more about IP rotation? Read this article on using proxies and rotating IP addresses.

Why Use a Web Scraping Service

You can manage proxies yourself while scraping Amazon. Just create a list of proxies and use itertools to cycle through them.

However, choosing the right proxies and managing them yourself can be cumbersome, especially if you only need the data. It’s better to use a web scraping service in that case.

With a web scraping service like ScrapeHero, you won’t have to bother about choosing and managing proxies or other technical aspects of web scraping. We’ll take care of all that.

ScrapeHero is an enterprise-grade web scraping service that can build high-quality scrapers and crawlers for you. Our services can also handle your complete data pipeline, including robotic process automation and custom AI solutions.