How to Scrape Amazon Product Data: Using Code and No Code Approaches

Share:

scrape Amazon product data

This article outlines methods to scrape Amazon product data. This process effectively exports Amazon product data to Excel or other formats for easier access and use.

There are three methods to scrape Amazon product data:

ScrapeHero Cloud offers you ready-made web crawlers and real-time APIs, which are the easiest way to extract data from websites and download it into spreadsheets with a few clicks.

Don’t want to code? ScrapeHero Cloud is exactly what you need.

With ScrapeHero Cloud, you can download data in just two clicks!

Building an Amazon Scraper in Python or JavaScript

In this section, we will guide you on how to scrape Amazon product data using either Python or JavaScript. We will utilize the browser automation framework called Playwright to emulate browser behavior in our code. 

You could also use Python Requests, BeautifulSoup, or LXML to build an Amazon scraper without using a browser or a browser automation library. However, bypassing the anti-scraping mechanisms put in place can be challenging and is beyond the scope of this article.

Here are the steps for scraping Amazon product data using Playwright:

Step 1: Choose either Python or JavaScript as your programming language.

Step 2: Install Playwright for your preferred language.

Python:

pip install playwright
# to download the necessary browsers
playwright install

JavaScript:

npm install playwright@latest

Step 3: Write your code to emulate browser behavior and extract the desired data from Amazon using the Playwright API. You can use the code:

Python:

import asyncio
import json

from playwright.async_api import async_playwright

url = "https://www.amazon.com/Imploding-Kittens-First-Expansion-Exploding/dp/B01HSIIFQ2/]?th=1"


async def extract_data(page) -> list:
    """
    Parsing details from the product page

    Args:
        page: webpage of the browser

    Returns:
        list: details of product on amazon
    """

    # Initializing selectors and xpaths
    title_xpath = "h1[id='title']"
    asin_selector = "//td/div[@id='averageCustomerReviews']"
    rating_xpath = "//div[@id='prodDetails']//i[contains(@class,'review-stars')]/span"
    ratings_count_xpath = "//div[@id='prodDetails']//span[@id='acrCustomerReviewText']"
    selling_price_xpath = "//input[@id='priceValue']"
    listing_price_xpath = "//div[@id='apex_desktop_qualifiedBuybox']//span[@class='a-price a-text-price']/span[@class='a-offscreen']"
    img_link_xpath = "//div[contains(@class,'imgTagWrapper')]//img"
    brand_xpath = (
        "//tr[contains(@class,'po-brand')]//span[@class='a-size-base po-break-word']"
    )
    status_xpath = "//div[@id='availabilityInsideBuyBox_feature_div']//div[@id='availability']/span"
    description_ul_xpath = (
        "//ul[@class='a-unordered-list a-vertical a-spacing-mini']/li"
    )
    product_description_xpath = "//div[@id='productDescription']//span"

    # Waiting for the page to finish loading
    await page.wait_for_selector(title_xpath)

    # Extracting the elements
    product_title = (
        await page.locator(title_xpath).inner_text()
        if await page.locator(title_xpath).count()
        else None
    )
    asin = (
        await page.locator(asin_selector).get_attribute("data-asin")
        if await page.locator(asin_selector).count()
        else None
    )
    rating = (
        await page.locator(rating_xpath).inner_text()
        if await page.locator(rating_xpath).count()
        else None
    )
    rating_count = (
        await page.locator(ratings_count_xpath).inner_text()
        if await page.locator(ratings_count_xpath).count()
        else None
    )
    selling_price = (
        await page.locator(selling_price_xpath).get_attribute("value")
        if await page.locator(selling_price_xpath).count()
        else None
    )
    listing_price = (
        await page.locator(listing_price_xpath).inner_text()
        if await page.locator(listing_price_xpath).count()
        else None
    )
    brand = (
        await page.locator(brand_xpath).inner_text()
        if await page.locator(brand_xpath).count()
        else None
    )
    product_description = (
        await page.locator(product_description_xpath).inner_text()
        if await page.locator(product_description_xpath).count()
        else None
    )
    image_link = (
        await page.locator(img_link_xpath).get_attribute("src")
        if await page.locator(img_link_xpath).count()
        else None
    )
    status = (
        await page.locator(status_xpath).inner_text()
        if await page.locator(status_xpath).count()
        else None
    )

    # full_description is found as list, so iterating the list elements to get the descriptions
    full_description_list = []
    desc_lists = page.locator(description_ul_xpath)
    desc_count = await desc_lists.count()
    for index in range(desc_count):
        li_element = desc_lists.nth(index=index)
        desc = (
            await li_element.locator("//span").inner_text()
            if await li_element.locator("//span").count()
            else None
        )
        full_description_list.append(desc)
    full_description = " | ".join(full_description_list)

    # cleaning data
    product_title = clean_data(product_title)
    asin = clean_data(asin)
    rating = clean_data(rating)
    rating_count = clean_data(rating_count)
    selling_price = clean_data(selling_price)
    listing_price = clean_data(listing_price)
    brand = clean_data(brand)
    image_link = clean_data(image_link)
    status = clean_data(status)
    product_description = clean_data(product_description)
    full_description = clean_data(full_description)

    data_to_save = {
        "product_title": product_title,
        "asin": asin,
        "rating": rating,
        "rating_count": rating_count,
        "selling_price": selling_price,
        "listing_price": listing_price,
        "brand": brand,
        "image_links": image_link,
        "status": status,
        "product_description": product_description,
        "full_description": full_description,
    }

    save_data(data_to_save, "Data.json")


async def run(playwright) -> None:
    # Initializing the browser and creating a new page.
    browser = await playwright.chromium.launch(headless=False)
    context = await browser.new_context()
    page = await context.new_page()

    await page.set_viewport_size({"width": 1920, "height": 1080})
    page.set_default_timeout(300000)

    # Navigating to the homepage
    await page.goto(url, wait_until="domcontentloaded")
    await extract_data(page)

    await context.close()
    await browser.close()


def clean_data(data: str) -> str:
    """
    Cleaning data by removing extra white spaces and Unicode characters

    Args:
        data (str): data to be cleaned

    Returns:
        str: cleaned string
    """
    if not data:
        return None
    cleaned_data = " ".join(data.split()).strip()
    cleaned_data = cleaned_data.encode("ascii", "ignore").decode("ascii")
    return cleaned_data


def save_data(product_page_data: dict, filename: str):
    """Converting a list of dictionaries to JSON format

    Args:
        product_page_data (list): details of each product
        filename (str): name of the JSON file
    """
    with open(filename, "w") as outfile:
        json.dump(product_page_data, outfile, indent=4)


async def main() -> None:
    async with async_playwright() as playwright:
        await run(playwright)


if __name__ == "__main__":
    asyncio.run(main())

JavaScript:

const { chromium, firefox } = require('playwright');
const fs = require('fs');
const { title } = require('process');
const url = "https://www.amazon.com/Imploding-Kittens-First-Expansion-Exploding/dp/B01HSIIFQ2/]?th=1";
/**
* Save data as list of dictionaries
as json file
* @param {object} data
*/
function saveData(data) {
    let dataStr = JSON.stringify(data, null, 2)
    fs.writeFile("data.json", dataStr, 'utf8', function (err) {
        if (err) {
            console.log("An error occurred while writing JSON Object to File.");
            return console.log(err);
        }
        console.log("JSON file has been saved.");
    });
}
function cleanData(data) {
    if (!data) {
        return;
    }
    // removing extra spaces and unicode characters
    let cleanedData = data.split(/s+/).join(" ").trim();
    cleanedData = cleanedData.replace(/[^x00-x7F]/g, "");
    return cleanedData;
}
// The data extraction function used to extract
// necessary data from the element.
async function extractData(data, type) {
    let count = await data.count();
    if (count) {
        if (type == 'innerText') {
            return await data.innerText()    
        }else {
            return await data.getAttribute(type)
        }
    }
    return null
};
async function parsePage(page) {
    // initializing xpaths
    let titleXPath = "h1[id='title']";
    let asinSelector = "//td/div[@id='averageCustomerReviews']";
    let ratingXPath = "//div[@id='prodDetails']//i[contains(@class,'review-stars')]/span";
    let ratingsCountXPath = "//div[@id='prodDetails']//span[@id='acrCustomerReviewText']";
    let sellingPriceXPath = "//input[@id='priceValue']";
    let listingPriceXPath = "//div[@id='apex_desktop_qualifiedBuybox']//span[@class='a-price a-text-price']/span[@class='a-offscreen']";
    let imgLinkXPath = "//div[contains(@class,'imgTagWrapper')]//img";
    let brandXPath = "//tr[contains(@class,'po-brand')]//span[@class='a-size-base po-break-word']";
    let statusXPath = "//div[@id='availabilityInsideBuyBox_feature_div']//div[@id='availability']/span";
    let descriptionULXPath = "//ul[@class='a-unordered-list a-vertical a-spacing-mini']/li";
    let productDescriptionXPath = "//div[@id='productDescription']//span";
    // wait until page loads
    await page.waitForSelector(titleXPath);
    // extract data using xpath
    let productTitle = page.locator(titleXPath);
    productTitle = await extractData(productTitle, type ='innerText');
    let asin = page.locator(asinSelector);
    asin = await extractData(asin, type = 'data-asin');
    let rating = page.locator(ratingXPath);
    rating = await extractData(rating, type ='innerText');
    let ratingCount = page.locator(ratingsCountXPath);
    ratingCount = await extractData(ratingCount, type ='innerText');
    let sellingPrice = page.locator(sellingPriceXPath);
    sellingPrice = await extractData(sellingPrice, type='value');
    let listingPrice = page.locator(listingPriceXPath);
    listingPrice = await extractData(listingPrice, type ='innerText');
    let brand = page.locator(brandXPath);
    brand = await extractData(brand, type ='innerText');
    let productDescription = page.locator(productDescriptionXPath);
    productDescription = await extractData(productDescription, type ='innerText');
    let imageLink = page.locator(imgLinkXPath);
    imageLink = await extractData(imageLink, type ='src');
    let status = page.locator(statusXPath);
    status = await extractData(status, type ='innerText');
    // since fulldescription is in <li> element , iteration is needed let fullDescriptionList = []; let descLists = page.locator(descriptionULXPath); let descCount = await descLists.count(); for (let index = 0; index < descCount; index++) { let liElement = descLists.nth(index); let desc = liElement.locator('//span'); desc = await extractData(desc, type ='innerText'); fullDescriptionList.push(desc); } let fullDescription = fullDescriptionList.join(" | ") || null;
// cleaning data
productTitle = cleanData(productTitle)
asin = cleanData(asin)
rating = cleanData(rating)
ratingCount = cleanData(ratingCount)
sellingPrice = cleanData(sellingPrice)
listingPrice = cleanData(listingPrice)
brand = cleanData(brand)
imageLink = cleanData(imageLink)
status = cleanData(status)
productDescription = cleanData(productDescription)
fullDescription = cleanData(fullDescription)
let dataToSave = {
productTitle: productTitle,
asin: asin,
rating: rating,
ratingCount: ratingCount,
sellingPrice: sellingPrice,
listingPrice: listingPrice,
brand: brand,
imageLinks: imageLink,
status: status,
productDescription: productDescription,
fullDescription: fullDescription,
};
saveData(dataToSave);
}
/**
* The main function initiates a browser object and handles the navigation.
*/
async function run() {
// initializing browser and creating new page
const browser = await chromium.launch({ headless: false });
const context = await browser.newContext();
const page = await context.newPage();
await page.setViewportSize({ width: 1920, height: 1080 });
page.setDefaultTimeout(30000);
// Navigating to the home page
await page.goto(url, { waitUntil: 'domcontentloaded' });
await parsePage(page);
await context.close();
await browser.close();
};
run();

This code shows how to scrape Amazon using the Playwright library in Python and JavaScript.

The corresponding scripts have two main functions, namely:

  1. run function: This function takes a Playwright instance as an input and performs the scraping process. The function launches a Chromium browser instance, navigates to an Amazon page, fills in a search query, clicks the search button, and waits for the results to be displayed on the page.
    The extract_details function is then called to extract the product details and store the data in a JSON file.
  2. extract_data function: This function takes a Playwright page object as input and returns a list of dictionaries containing product details. The details include name, brand, seller, rating, sale price, etc.

Finally, the main function uses the async_playwright context manager to execute the run function. A JSON file containing the listings of the Amazon product data script you just executed would be created.

Step 4: Run your code for Scraping Amazon product data.

Using No-Code Amazon Product Details and Pricing Scraper by ScrapeHero Cloud

The Amazon Product Details and Pricing Scraper by ScrapeHero Cloud is a convenient method for scraping product details from Amazon. It provides an easy, no-code method for scraping data, making it accessible for individuals with limited technical skills.

This section will guide you through the steps to set up and use the Amazon Product Details and Pricing scraper.

1. Sign up or log in to your ScrapeHero Cloud account.

2. Go to the Amazon Product Details and Pricing scraper by ScrapeHero Cloud.

Amazon Product Details and Pricing scraper

3. Click the Create New Project button.

Clicking the Create New Project’ button

4. To scrape the details, you need to either provide product URL or ASIN

  • You can get the product URL from the Amazon search results page.

Product URL from the Amazon search results page

  • You can get the product’s ASIN from the product information section of a product listing page.

Product ASIN

5. In the field provided, enter a project name, product URL or ASIN and the maximum number of records you want to gather. Then, click the Gather Data button to start the scraper.

Entering a project name, ASIN, and the number of records to gather

6. The scraper will start fetching data for your queries, and you can track its progress under the Projects tab.

Tracking the scraper's progress under the Projects tab.

7. Once it is finished, you can view the data by clicking on the project name. A new page will appear, and under the Overview tab, you can see and download the data.

Downloading the final data

8. You can also pull Amazon product data into a spreadsheet from here. Just click on Download Data, select Excel, and open the downloaded file using Microsoft Excel.

Using Amazon Product Details and Pricing API by ScrapeHero Cloud

The ScrapeHero Cloud Amazon Product Details and Pricing API is an alternate tool for extracting product details from Amazon. This user-friendly API enables those with minimal technical expertise to obtain product data effortlessly from Amazon.

This section will walk you through the steps to configure and utilize the Amazon Product Details and Pricing API provided by ScrapeHero Cloud.

Here are steps to configure and utilize this API:

1. Sign up or log in to your ScrapeHero Cloud account.

2. Go to the Amazon Product Details and Pricing API by ScrapeHero Cloud in the marketplace.

Amazon product details and pricing API

3. Click on the Try This API  button.

Clicking on the Try this API button4. In the field provided enter the product ASIN. You can also provide the country code if you want. Click Send request.

Providing the product ASIN and sending requests5. You will get the results in the response window on the bottom right side of the page.

Final results obtained

Uses Cases of Amazon Product Data

If you’re unsure as to why you should scrape Amazon product data, here are a few use cases where this data would be helpful: 

  • Market Analysis and Competitive Intelligence

By scraping Amazon product data, businesses can analyze market trends, understand consumer preferences, and monitor competitor activities

  • Price Optimization

By scraping Amazon prices using an Amazon price scraper, retailers and sellers can use the data obtained to optimize their pricing strategies by analyzing the pricing patterns of similar products.

  • Product Development and Innovation

Manufacturers and brands can scrape Amazon product data to identify gaps in the market, understand consumer pain points, and gather ideas for product improvements or new product features.

  • Reputation and Brand Management

The data obtained by the Amazon data scraper can be used to monitor product reviews and ratings on Amazon. It also helps businesses manage their online reputation and respond effectively to customer feedback. 

  • Inventory and Supply Chain Management

With Amazon scraping, businesses can better forecast demand, optimize stock levels, and reduce inventory holding costs by analyzing sales velocity, seasonal trends, and consumer demand patterns on Amazon.

Frequently Asked Questions

Can you scrape data from Amazon?

Yes. You can scrape Amazon product data by using a Python or JavaScript scraper. If you do not want to code, then use ScrapeHero Amazon Product Details and Pricing Scraper. 
You can also choose the Amazon Product Details and Pricing API by ScrapeHero Cloud to integrate with any application to stream product data.

How to scrape Amazon product information using BeautifulSoup?

To scrape Amazon product information using BeautifulSoup, send GET requests to the product’s page using the Requests library, then parse the HTML response using BeautifulSoup to extract essential information like name, price, and description. 

How can you scrape Amazon using Selenium(Python)?

For web scraping Amazon using Selenium(Python), you have to set up Selenium WebDriver for automating a web browser and navigating to the Amazon product page. Later, use locators like By.XPATH to find and interact with search elements. 

Does Amazon allow review scraping? 

Amazon does not directly support or encourage web scraping, but scraping publicly available data is not illegal. 
You can scrape Amazon product reviews using Python or JavaScript. 
ScrapeHero provides an Amazon Product Reviews and Ratings Scraper, which is a no-coding Amazon product data scraping tool for this purpose. 
You can also use the Amazon Reviews API by ScrapeHero Cloud for integrating with applications.

What is the subscription fee for the Amazon Product Details and Pricing Scraper by ScrapeHero?

ScrapeHero provides a comprehensive pricing plan for both Scrapers and APIs. To know more about the pricing, visit our pricing page.

Is it legal to scrape from Amazon?

The legality of web scraping depends on the legal jurisdiction, i.e., laws specific to the country and the locality. Gathering or scraping publicly available information is not illegal.

Table of content

Scrape any website, any format, no sweat.

ScrapeHero is the real deal for enterprise-grade scraping.

Clients love ScrapeHero on G2

Ready to turn the internet into meaningful and usable data?

Contact us to schedule a brief, introductory call with our experts and learn how we can assist your needs.

Continue Reading

IP rotation for scraping

Scraper Blocked by Amazon? IP Rotation for Scraping Can be the Answer

Learn how you can manage IP rotation when scraping Amazon.
Best Tools for Scraping Amazon Reviews

The Ultimate List of 5 Best Tools for Scraping Amazon Reviews

Here are 5 best tools for scraping Amazon reviews; discussing the features, pros, and cons of each.
Enhance scraped data

Transform Raw Data: Enrich and Enhance Scraped Data Effectively

Learn ways in which you can enrich and enhance scraped data for decision-making and increase operational efficiency.
ScrapeHero Logo

Can we help you get some data?