Learn how to scrape Amazon reviews for free using ScrapeHero Cloud crawler. Scrape Review details from Amazon such as title, content, ASIN, date and more.
This article outlines a few methods to scrape Amazon reviews and ratings. This could effectively export Amazon review data to Excel or other formats for easier access and use.
There are three methods to scrape Amazon Reviews:
- Scraping Amazon Reviews in Python or JavaScript
- Using the ScrapeHero Cloud, Amazon Product Reviews and Ratings Scraper, a no-code tool
- Using Amazon Reviews API by ScrapeHero Cloud
If you don’t like or want to code, ScrapeHero Cloud is just right for you!
Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds.
Get Started for FreeBuild an Amazon Review Scraper in Python/JavaScript
In this section, we will guide you on how to scrape Amazon Reviews using either Python or JavaScript. We will utilize the browser automation framework called Playwright to emulate browser behavior in our code.
One of the key advantages of this approach is its ability to bypass common blocks often put in place to prevent scraping. However, familiarity with the Playwright API is necessary to use it effectively.
You could also use Python Requests, LXML, or Beautiful Soup to build an Amazon scraper without using a browser or a browser automation library. But bypassing the anti scraping mechanisms put in place can be challenging and is beyond the scope of this article.
Here are the steps to scrape Amazon reviews data using Playwright:
Step 1: Choose either Python or JavaScript as your programming language.
Step 2: Install Playwright for your preferred language:
pip install playwright
# to download the necessary browsers
playwright install
npm install playwright@latest
Step 3: Write your code to emulate browser behavior and extract the review data from Amazon using the Playwright API. You can use the code provided below:
import asyncio
import json
from playwright.async_api import async_playwright
url = " https://www.amazon.com/Crocs-Unisex-Classic-Black-Women/dp/B0014C2NBC/ref=sr_1_3?crid=2WI0QM6BYFKLZ&keywords=crocs&qid=1696915791&s=amazon-devices&sprefix=croc%2Camazon-devices%2C449&sr=1-3"
max_pagination = 2
async def extract_data(page) -> list:
"""
Parsing details from the listing page
Args:
page: webpage of the browser
Returns:
list: details of homes for sale
"""
# Initializing selectors and xpaths
seemore_selector = "//div[@id='reviews-medley-footer']//a"
div_selector = "[class='a-section celwidget']"
next_page_selector = "[class='a-last']"
name_xpath = "//a[@class='a-profile']//span[@class='a-profile-name']"
rate_xpath = "//a//i[contains(@class,'review-rating')]/span"
review_title_xpath = "//a[contains(@class, 'review-title')]/span[2]"
review_date_xpath = "//span[contains(@class,'review-date')]"
review_text_xpath = "[data-hook='review-body']"
# navigating to reviewpage
review_page_locator = page.locator(seemore_selector)
await review_page_locator.hover()
await review_page_locator.click()
# List to save the details of properties
amazon_reviews_ratings = []
# Paginating through each page
for _ in range(max_pagination):
# Waiting for the page to finish loading
await page.wait_for_load_state("load")
# Extracting the elements
review_cards = page.locator(div_selector)
cards_count = await review_cards.count()
for index in range(cards_count):
# Hovering the element to load the price
inner_element = review_cards.nth(index=index)
await inner_element.hover()
inner_element = review_cards.nth(index=index)
# Extracting necessary data
name = await inner_element.locator(name_xpath).inner_text() if await inner_element.locator(name_xpath).count() else None
rate = await inner_element.locator(rate_xpath).inner_text() if await inner_element.locator(rate_xpath).count() else None
review_title = await inner_element.locator(review_title_xpath).inner_text() if await inner_element.locator(review_title_xpath).count() else None
review_date = await inner_element.locator(review_date_xpath).inner_text() if await inner_element.locator(review_date_xpath).count() else None
review_text = await inner_element.locator(review_text_xpath).inner_text() if await inner_element.locator(review_text_xpath).count() else None
# Removing extra spaces and unicode characters
name = clean_data(name)
rate = clean_data(rate)
review_title = clean_data(review_title)
review_date = clean_data(review_date)
review_text = clean_data(review_text)
data_to_save = {
"reviewer_name": name,
"rate": rate,
"review_title": review_title,
"review_date": review_date,
"review_text": review_text
}
amazon_reviews_ratings.append(data_to_save)
next_page_locator = page.locator(next_page_selector)
# Check if the "Next Page" button exists
if await next_page_locator.count() > 0:
await next_page_locator.hover()
await next_page_locator.click()
else:
break
save_data(amazon_reviews_ratings, "Data.json")
async def run(playwright) -> None:
# Initializing the browser and creating a new page.
browser = await playwright.firefox.launch(headless=False)
context = await browser.new_context()
page = await context.new_page()
await page.set_viewport_size({"width": 1920, "height": 1080})
page.set_default_timeout(120000)
# Navigating to the homepage
await page.goto(url, wait_until="domcontentloaded")
await extract_data(page)
await context.close()
await browser.close()
def clean_data(data: str) -> str:
"""
Cleaning data by removing extra white spaces and Unicode characters
Args:
data (str): data to be cleaned
Returns:
str: cleaned string
"""
if not data:
return ""
cleaned_data = " ".join(data.split()).strip()
cleaned_data = cleaned_data.encode("ascii", "ignore").decode("ascii")
return cleaned_data
def save_data(product_page_data: list, filename: str):
"""Converting a list of dictionaries to JSON format
Args:
product_page_data (list): details of each product
filename (str): name of the JSON file
"""
with open(filename, "w") as outfile:
json.dump(product_page_data, outfile, indent=4)
async def main() -> None:
async with async_playwright() as playwright:
await run(playwright)
if __name__ == "__main__":
asyncio.run(main())
const { chromium, firefox } = require('playwright');
const fs = require('fs');
const url = "https://www.amazon.com/Crocs-Unisex-Classic-Black-Women/dp/B0014C2NBC/ref=sr_1_3?crid=2WI0QM6BYFKLZ&keywords=crocs&qid=1696915791&s=amazon-devices&sprefix=croc%2Camazon-devices%2C449&sr=1-3";
const maxPagination = 2;
/**
* Save data as list of dictionaries
as json file
* @param {object} data
*/
function saveData(data) {
let dataStr = JSON.stringify(data, null, 2)
fs.writeFile("data.json", dataStr, 'utf8', function (err) {
if (err) {
console.log("An error occurred while writing JSON Object to File.");
return console.log(err);
}
console.log("JSON file has been saved.");
});
}
function cleanData(data) {
if (!data) {
return;
}
// removing extra spaces and unicode characters
let cleanedData = data.split(/\s+/).join(" ").trim();
cleanedData = cleanedData.replace(/[^\x00-\x7F]/g, "");
return cleanedData;
}
/**
* The data extraction function used to extract
necessary data from the element.
* @param {HtmlElement} innerElement
* @returns
*/
async function extractData(innerElement) {
async function extractData(data) {
let count = await data.count();
if (count) {
return await data.innerText()
}
return null
};
// intializing xpath and selectors
const nameXpath = "//a[@class='a-profile']//span[@class='a-profile-name']";
const rateXpath = "//a//i[contains(@class,'review-rating')]/span";
const reviewTitleXpath = "//a[contains(@class, 'review-title')]/span[2]";
const reviewDateXpath = "//span[contains(@class,'review-date')]";
const reviewTextXpath = "[data-hook='review-body']";
let name = innerElement.locator(nameXpath);
name = await extractData(name);
let rate = innerElement.locator(rateXpath);
rate = await extractData(rate);
let reviewTitle = innerElement.locator(reviewTitleXpath);
reviewTitle = await extractData(reviewTitle);
let reviewDate = innerElement.locator(reviewDateXpath);
reviewDate = await extractData(reviewDate);
let reviewText = innerElement.locator(reviewTextXpath);
reviewText = await extractData(reviewText);
// cleaning data
name = cleanData(name)
rate = cleanData(rate)
reviewTitle = cleanData(reviewTitle)
reviewDate = cleanData(reviewDate)
reviewText = cleanData(reviewText)
extractedData = {
"reviewer's name": name,
'rate': rate,
'reviewTitle': reviewTitle,
'reviewDate': reviewDate,
'reviewText': reviewText
}
return extractedData
}
/**
* The main function initiate a browser object and handle the navigation.
*/
async function run() {
// intializing browser and creating new page
const browser = await firefox.launch({ headless: false });
const context = await browser.newContext();
const page = await context.newPage();
await page.setViewportSize({ width: 1920, height: 1080 });
page.setDefaultTimeout(120000);
// intializing xpath and selectors
const seeMoreSelector = "//div[@id='reviews-medley-footer']//a"
const divSelector = "[class='a-section celwidget']"
const nextPageSelector = "[class='a-last']"
// Navigating to the home page
await page.goto(url, { waitUntil: 'domcontentloaded' });
// Navigating to Review page
const ReviewPageLocator = await page.$(seeMoreSelector);
await ReviewPageLocator.hover();
await ReviewPageLocator.click();
// Wait until the list of properties is loaded
await page.waitForSelector(divSelector);
// to store the extracted data
let data = [];
// navigating through pagination
for (let pageNum = 0; pageNum < maxPagination; pageNum++) {
await page.waitForLoadState("load");
await page.waitForTimeout(10);
let reviewCards = page.locator(divSelector);
reviewCardsCount = await reviewCards.count()
// going through each listing element
for (let index = 0; index < reviewCardsCount; index++) {
await page.waitForTimeout(2000);
await page.waitForLoadState("load");
let innerElement = await reviewCards.nth(index);
await innerElement.hover();
innerElement = await reviewCards.nth(index);
let dataToSave = await extractData(innerElement);
data.push(dataToSave);
};
//to load next page
const nextPageLocator = await page.$(nextPageSelector);
// Check if the "Next Page" button exists
if (nextPageLocator !== null) {
await nextPageLocator.hover();
await nextPageLocator.click();
} else {
// Exit the loop or perform other actions when the button is not found
break;
}
};
saveData(data);
await context.close();
await browser.close();
};
run();
This code shows how to scrape Amazon reviews using the Playwright library in Python and JavaScript.
The corresponding scripts have two main functions, namely:
- run function: This function takes a Playwright instance as an input and performs the scraping process. The function launches a Chromium browser instance, navigates to Amazon, fills in a search query, clicks the search button, and waits for the results to be displayed on the page.
The save_data function is then called to extract the review details and store the data in a Data.json file. - save_data function: This function takes a Playwright page object as input and returns a list of dictionaries containing review details. The details include the reviewer’s name, the rating they gave , the review’s title, date of posting the review and the review text.
Finally, the main function uses the async_playwright context manager to execute the run function. A JSON file containing the listings of the Amazon reviews scraper script you just executed would be created.
Step 4: Run your code and collect the scraped reviews and ratings from Amazon.
Using No-Code Amazon Reviews Scraper by ScrapeHero Cloud
The Amazon Reviews and Ratings Scraper by ScrapeHero Cloud is a convenient method for scraping reviews and ratings from Amazon. It provides an easy, no-code method for scraping data, making it accessible for individuals with limited technical skills.
This section will guide you through the steps to set up and use the Amazon Reviews Scraper.
- Sign up or log in to your ScrapeHero Cloud account.
- Go to the Amazon Product Reviews and Ratings Scraper by ScrapeHero Cloud in the marketplace.
- Add the scraper to your account. (Don’t forget to verify your email if you haven’t already.)
- You need to add the product URL or ASIN to start the scraper. If it’s just a single query, enter it in the field provided and choose the number of pages to scrape.
- You can get the product URL from the address bar of the product page in Amazon.
- You can get the ASIN from the product details section of the product page.
- To scrape results for multiple queries, switch to Advance Mode, and in the Input tab, add the products’ URLs or ASINs to the SearchQuery field and save the settings.
- To start the scraper, click on the Gather Data button.
- The scraper will start fetching data for your queries, and you can track its progress under the Jobs tab.
- Once finished, you can view or download the data from the same.
- You can also export the review data into an Excel spreadsheet from here. Click on the Download Data, select “Excel,” and open the downloaded file using Microsoft Excel.
Using Amazon Reviews API by ScrapeHero Cloud
The ScrapeHero Cloud Amazon Reviews API is an alternate tool for extracting reviews and ratings from Amazon. This user-friendly API enables those with minimal technical expertise to obtain review data effortlessly from Amazon.
This section will walk you through the steps to configure and utilize the Amazon Reviews Scraper API provided by ScrapeHero Cloud.
- Sign up or log in to your ScrapeHero Cloud account.
- Go to the Amazon Reviews Scraper API by ScrapeHero Cloud in the marketplace.
- Click on the subscribe button.
- As this is a paid API, you must subscribe to one of the available plans to use the API.
- After subscribing to a plan, head over to the Documentation tab to get the necessary steps to integrate the API into your application.
Note: Amazon limits the number of review pages you can scrape to 10, which is equivalent to scraping a maximum of 100 reviews.
Use Cases of Amazon Review Data
Wondering why you might want to scrape product reviews and ratings from Amazon? Here are some use cases where this information can make a difference:
Business Reputation Management
Keeping tabs on your business reputation involves continuously monitoring how you’re perceived and dissecting customer opinions about your products and services. With detailed review data, you can gain deep insights into both your operational strengths and areas needing improvement.
Competitor Analysis
By examining competitor reviews, you can pinpoint their strengths and weaknesses, compare product features, and identify market gaps that your business can fill. By monitoring these reviews over time, you can detect shifts in consumer preferences and emerging challenges, enabling you to adapt your strategies proactively.
Supply Chain Optimization
If multiple reviews point out issues with product delivery times or conditions upon arrival, this could signal a problem in your supply chain. Data from reviews can help you identify bottlenecks or inefficiencies that need to be addressed, allowing you to make targeted improvements in your logistics and distribution.
Customer Support Assessment
Reviews often mention customer service experiences, whether good or bad. By systematically analyzing this feedback, you can gauge the effectiveness of your customer support team. This enables you to make necessary adjustments, whether it’s retraining staff or revising support protocols, to elevate customer satisfaction.
Price Point Evaluation
Consumers often discuss whether a product offers good value for its price in their reviews. By keeping an eye on these comments, you can assess if your pricing strategy aligns with consumer perception of value.
Frequently Asked Questions
The best and most hassle-free way to scrape Amazon product data is to use pre-built scrapers or APIs readily available online. ScrapeHero Cloud’s Amazon Product Data Scraper is quite affordable and comes with a free tier as well. We also offer an Amazon Product Details API for the same.
Yes, it is possible to scrape Amazon reviews. However, Amazon limits the number of pages you can scrape to 10, which is equivalent to scraping a maximum of 100 reviews. You can scrape Amazon reviews either by building a scraper in Python, JavaScript, etc or use pre-built Scrapers and APIs like Amazon Product Reviews and Ratings Scraper and Amazon Reviews Scraper API from ScrapeHero Cloud.
To know more about the pricing, visit the pricing page.
Legality depends on the legal jurisdiction, i.e., laws specific to the country and the locality. Gathering or scraping publicly available information is not illegal.
Generally, Web scraping is legal if you are scraping publicly available data.
Please refer to our Legal Page to learn more about the legality of web scraping.
We can help with your data or automation needs
Turn the Internet into meaningful, structured and usable data
Continue Reading ..
- Scrape Amazon Reviews using Google Chrome
- Amazon Top 10 Categories
A picture speaks a thousand words, or so the saying goes - so here we go
- Web Scraping liquor prices and delivery status from Total Wine and More store
Building a Total Wine and More Liquor delivery and stock checker to extract Product Name, Delivery Availability, Price, Stock Status etc into an Excel Spreadsheet
- How to Scrape Amazon Best Seller Listings
Learn how to scrape Amazon best seller listings using ScrapeHero Cloud. Scrape Amazon bestsellers data such as - Name, Rank, Price , Seller and more
Posted in: Developers, Featured, ScrapeHero Cloud, ScrapeHero Data Store Resources, Tutorials, Web Scraping Tutorials
Published On: October 27, 2023