Need to Scrape Yelp Reviews? Check Out This Tutorial

Share:

Scrape Yelp Reviews

The dynamic nature of Yelp’s website can make web scraping Yelp reviews tricky. However, you can still scrape Yelp reviews using browser automation libraries that can execute JavaScript.

This tutorial shows you how to scrape Yelp using Python with Selenium, a powerful browser automation library.

How to Scrape Yelp Reviews: The Environment

This script uses two external Python libraries: Selenium and BeautifulSoup. Selenium enables you to navigate the Yelp website, run JavaScript, and extract the HTML code of the review page, while BeautifulSoup helps parse this code to extract the review data.

You can install both of the libraries using Python’s package manager pip.

pip install beautifulsoup4 selenium

In addition to these libraries, this code also uses Python’s built-in packages: json and time. The json module allows you to handle JSON data, including saving it as a file; the time module allows you to implement waits for HTML elements to load properly.

There are plenty of libraries for web scraping. Check out this article on the top Python web scraping libraries and frameworks.

How to Scrape Yelp Reviews: The Data Extracted

This tutorial scrapes five key data points from each review on Yelp:

  1. Reviewer’s name
  2. Reviewer’s address
  3. Rating
  4. Comment
  5. Review date

You’ll need to analyze the HTML source code of the review page to identify how to locate the elements containing these data points. Use developer tools to do so. Right-click on a data point, click ‘Inspect,’ and look for unique attributes.

Unique attribute of review text on developer options

Need a tutorial for web scraping Yelp business details? Check this article on how to scrape Yelp data using Python.

How to Scrape Yelp Reviews: The Code

After understanding the attributes of the elements holding the required data, you can start writing your code.

Start by importing the necessary packages.

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import  BeautifulSoup
from time import sleep
import  json

The above code doesn’t import the entire Selenium library. Because, for a basic Selenium web scraping, you only need to import two of its modules: webdriver and By

  • The webdriver module has methods to control the browsers
  • The By module lets you specify how to locate elements (XPath, by class, by ID, etc.)

Selenium has several automation indicator flags. You need to disable them to keep your scraper under the radar on Yelp. For Chrome, create a ChromeOptions() object and add arguments related to the flags.

options = webdriver.ChromeOptions()
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)

Next, launch the Selenium browser using webdriver.Chrome() with your options object as an argument.

browser = webdriver.Chrome(options=options)

You can then navigate to Yelp’s review page using the get() method.

browser.get(‘https://www.yelp.com/biz/the-st-regis-san-francisco-san-francisco’)

After loading the page, scrape the details. However, the reviews span multiple pages, so you’ll need to loop through each one and extract reviews. Using input(), prompt the user for the maximum number of pages to scrape.

max_pages = 1

Use this value as a range for a loop that starts by waiting 10 seconds for review elements to load.

for _ in range(int(max_pages)):
sleep(10)

Then, the loop retrieves the page’s HTML source code using the page_source attribute and passes it to BeautifulSoup for parsing.

html = browser.page_source
soup = BeautifulSoup(html,'lxml')

The next step is to locate review elements, which are all inside a div element with the id reviews. You can locate the div element using BeautifulSoup’s find method.

review_div = soup.find('div',{'id':'reviews'})

To extract all the review elements, use find_all().

reviews = review_div.find_all('li', {'class':'y-css-1sqelp2'})

Once all the review elements are located, iterate through them to find the necessary details:

  • User name and address from a div element with class ‘user-passport-info’
  • Rating from the aria-label of a div element of class ‘y-css-dnttlc’
  • Date from a span  element of the class ‘y-css-1d8mpv1’
  • Review text from a p element.
user = review.find('div',{'class':'user-passport-info'})
name = user.span.text
address = user.div.text
rating = review.find('div',{'class':'y-css-dnttlc'})['aria-label']
date = review.find('span',{'class':'y-css-1d8mpv1'}).text
comment = review.p.text

Finally, append the extracted data as a dict to your previously defined list and locate and click the link to the next page before repeating the loop. This process goes on until the code covers all the review elements.

review_data.append(

{  

    'Name':name,  

    'Address':address,  

    'Rating':rating,  

    'Date':date,  

    'Comment':comment  

}

browser.find_element(By.XPATH,'//a[@aria-label="Next"]').click()

After finishing the loop, save the extracted data in a JSON file using json.dump().

with open(f'{name}.json','w',encoding='utf-8') as f:
json.dump(review_data,f,indent=4,ensure_ascii=False)

You can also modify this code to use a CSV file for the review URLs, allowing you to extract reviews of multiple products using a single script.

To do that, encapsulate your current logic into a function scrape_yelp_reviews() that accepts a URL.

You can then read your CSV file using Pandas and iterate through each URL in it, calling scrape_yelp_review() with each URL as an argument.

if __name__=="__main__":

    urls = pandas.read_csv('yelp.csv',encoding='utf-8')
    for url in urls['url']:
        scrape_yelp_review(url)

The scraped Yelp reviews would look like this:

{
"Name": "Name N.",
"Address": "San Diego, CA",
"Rating": "5 star rating",
"Date": "Oct 7, 2024",
"Comment": "No words will be enough to describe this bed and breakfast. My experiences were great. They have friendly staff, clean rooms,  and well decorated common areas. You can also enjoy the Inn’s book collection."
}

Here’s the complete code for Yelp data scraping:

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import  BeautifulSoup
from time import sleep

import  json, pandas

def scrape_yelp_review(link):
    options = webdriver.ChromeOptions()
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)

    browser = webdriver.Chrome(options=options)
    browser.get(link)
    review_data = []

    max_pages = 1

    for _ in range(int(max_pages)):
        sleep(10)
        html = browser.page_source
        soup = BeautifulSoup(html,'lxml')

        review_div = soup.find('div',{'id':'reviews'})
        reviews = review_div.find_all('li', {'class':'y-css-1sqelp2'})

        for review in reviews:

            try:
                user = review.find('div',{'class':'user-passport-info'})
                name = user.span.text
                address = user.div.text
                rating = review.find('div',{'class':'y-css-dnttlc'})['aria-label']
                date = review.find('span',{'class':'y-css-1d8mpv1'}).text
                comment = review.p.text

                review_data.append(
                    {
                        'Name':name,
                        'Address':address,
                        'Rating':rating,
                        'Date':date,
                        'Comment':comment
                    }
                )
            except Exception as e:
                print(e)
        try:
            browser.find_element(By.XPATH,'//a[@aria-label="Next"]').click()
        except:
            print(f'No more pages')
            continue
    name = link.split('/')[4]
    with open(f'{name}.json','w',encoding='utf-8') as f:
        json.dump(review_data,f,indent=4,ensure_ascii=False)

if __name__=="__main__":

    urls = pandas.read_csv('yelp.csv',encoding='utf-8')
    for url in urls['url']:
        scrape_yelp_review(url)

Want to get leads from Yelp? Read this article on lead scraping using ScrapeHero Cloud.

Code Limitations

While the code shown in this tutorial can scrape Yelp reviews, there are some limitations:

  • It doesn’t use advanced techniques—such as proxy rotation—to avoid anti-scraping measures, making it unsuitable for large-scale web scraping.
  • You must monitor Yelp’s website for changes in HTML structure and update your code accordingly, or it will fail to extract Yelp reviews.

Yelp Review Scraper: An Alternative

If you want a no-code solution for scraping Yelp reviews, check out ScrapeHero Cloud’s Yelp Review Scraper. This ready-to-use cloud scraper can deliver high-quality Yelp reviews with few clicks; here’s how to get started for free:

  1. Go to Yelp Review Scraper and sign in. 
  2. Click on ‘Create New Project’
  3. Name your project and paste the URLs of Yelp business listings. 
  4. Click ‘Gather Data’

The scraper will begin collecting Yelp reviews. Once it’s finished, you can download the extracted reviews from ‘My Projects’ under ‘Project’ on the left pane. 

Want to scrape business listings instead? Read this article on scraping Yelp business listings.

Additional Features

Besides no-code scraping, ScrapeHero Cloud also allows you to

  • Set your scraper to run at regular intervals.
  • Link your cloud storage to have the extracted data sent directly there.
  • Integrate the scraper into your workflow using an API.

Why Use a Web Scraping Service

You can scrape Yelp reviews yourself using Python libraries like Selenium and BeautifulSoup. However, if you are only interested in getting reviews, why not consider a web scraping service?

A web scraping service like ScrapeHero can handle all the technicalities, including negotiating anti-scraping measures and monitoring your target website for changes.

ScrapeHero is an enterprise-grade web scraping service provider capable of building high-quality scrapers and crawlers.

Table of content

Scrape any website, any format, no sweat.

ScrapeHero is the real deal for enterprise-grade scraping.

Ready to turn the internet into meaningful and usable data?

Contact us to schedule a brief, introductory call with our experts and learn how we can assist your needs.

Continue Reading

Search engine web crawling

From Crawling to Ranking! This is How Search Engines Use Web Crawling to Index Websites!

Search engine crawling indexes web pages, making it essential for ranking and visibility in search results.
Geo-Restrictions in Web Scraping

These Proven Strategies Can Overcome Geo-Restrictions in Web Scraping

Here are some effective strategies for bypassing geo-restrictions in web scraping.
Automate web scraping

Try These Techniques To Automate Web Scraping, Saving Time and Effort!

Explore different methods to automate web scraping, from Python libraries and no-code platforms to AI-powered tools.
ScrapeHero Logo

Can we help you get some data?