web scraping

8 min read

Want to Solve Customer Problems? Learn How to Analyze Google Reviews

Matthew
Published: November 26, 2024

Steps to Analyze Google Reviews
Tips for Analyzing Google Reviews
Get Reviews Using ScrapeHero Cloud
How Can a Web Scraping Service Help You?

Analyzing Google reviews is an effective way to gain insights into customer experiences and preferences. This guide explains how to analyze Google reviews using Python, detailing the tools and techniques necessary for data interpretation.

You will learn to use libraries like Pandas and NLTK and visualize your results using tools like Matplotlib.

Steps to Analyze Google Reviews

1. Data Collection

For your business, you can use Google My Business API to get reviews. However, you need to scrape Google Maps to get reviews of other companies.

You can scrape Google reviews in three ways:

1. Writing Code: You can write a script in a programming language that fetches and parses HTML code. While most programming languages can be used for this purpose, Python is most popular because of its extensive community support and numerous dedicated scraping libraries. Every web scraping code has three parts:

Fetching the HTML code using libraries like Python’s requests, Urllib
Parsing the HTML code to extract the necessary data with parsers like BeautifulSoup or lxml
Saving the extracted data with modules like json and csv

2. Using Ready-Made Scrapers: You can use ready-made scrapers like Google Review Scraper from ScrapeHero Cloud, which offers a no-code solution. Within just a few clicks, you can obtain the required data:

Enter the review URLs
Save the settings and click ‘Gather Data’
Download the scraped data

3. Using Web Scraping APIs: You can also use a web scraping API that connects to a ready-made web scraper mentioned above. A web scraping API integrates more seamlessly with your workflow than directly using a ready-made scraper and is more straightforward than writing your own code. An example is the Google Review Scraper API from ScrapeHero Cloud:

Sign up for a premium membership on ScrapeHero Cloud and get your API key
Make an HTTP request to the API with the review URL or place ID along with your key
Save the required data from the response

This article on how to scrape Google Reviews provides detailed explanations of all these methods.

Once you have the reviews, you can use either a programming language to analyze them or opt for a ready-made Google review analyzer. This guide will tell you how to analyze Google reviews using Python.

2. Data Loading

You can use Pandas to load the data, with the method depending on the data format; for instance,

Use read_csv() to read CSV files

import pandas

df = pandas.read_csv(‘reviews.csv’)

Use read_json() to read JSON files

df = pandas.read_json(‘reviews.json’)

3. Data Cleaning

Cleaning your data before analysis is essential for ensuring high-quality results. Duplicate entries and missing values can lead to inaccurate conclusions.

Additionally, for sentiment analysis, you must prepare the text efficiently by removing stop words or converting it to lowercase.

Here are several methods for cleaning the extracted Google reviews:

Remove Duplicates
Fix Missing Values
Remove Non-Alphanumeric Characters
Convert review text to lowercase
Remove stop words

Remove Duplicates

You can remove duplicates using the drop_duplicates() method.

df.drop_duplicates()

Fix Missing Values

There are two common approaches for handling missing values:

Dropping missing values using dropna()

df.dropna()

Filling with a constant value

df.fillna(0)

Convert to Lowercase

Converting to lowercase ensures that AI models do not differentiate between words that differ only in case. You can convert text using str.lower():

df[‘reviewBody’’] = df[‘reviewBody’].str.lower()

Remove Non-Alphanumeric Characters

Non-Alphanumeric Characters are unnecessary for sentiment analysis, removing them improves efficiency:

cleaned_text = ''.join(char for char in reviewText if char.isalnum() or char == ' ')

Remove Stopwords

To remove stopwords, you can use the NLTK library, which is a toolkit for Natural Language Processing.

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

tokenized_review = word_tokenize(cleaned_text)
stop_words = set(stopwords.words('english'))
filtered_review = [word for word in tokenized_review if word not in stop_words]

This code first creates tokens from the review text, which are easy-to-process parts, usually words. It then retrieves stop words included in the NLTK library to filter out from the tokenized review.

4. Sentiment Analysis

Sentiment analysis determines whether a review is positive, negative, or neutral, letting you categorize and analyze Google customer reviews individually. By studying them separately, you gain deeper insights. For instance, analyzing negative reviews reveals weaknesses, while positive reviews highlight strengths.

from nltk.sentiment import SentimentIntensityAnalyzer

# Initialize the sentiment analyzer
sia = SentimentIntensityAnalyzer()

# Function to classify sentiment
def classify_sentiment(review):
    score = sia.polarity_scores(review)['compound']
    if score &gt;= 0.05:
        return 'positive'
    elif score &lt;= -0.05:
        return 'negative'
    else:
        return 'neutral'

# Apply sentiment classification
data['sentiment'] = data['reviewBody'].apply(classify_sentiment)

# Separate analyses based on sentiment
positive_reviews = data[data['sentiment'] == 'positive']
negative_reviews = data[data['sentiment'] == 'negative']
neutral_reviews = data[data['sentiment'] == 'neutral']

The above code uses SentimentIntensityAnalyzer from NLTK to assess whether a review is positive, negative, or neutral and separates them accordingly for further analysis.

5. Descriptive Statistics

You can perform descriptive analysis on your Google review dataset. The analysis will summarize your dataset using metrics like review counts and average ratings.

Pandas has a describe() method that lets you perform descriptive analysis very quickly.

positive.describe()

6. Trend Analysis

After categorizing the reviews, you can track the number of positive and negative reviews over time. Ideally, positive reviews should increase while negative ones decrease; you may also observe the variations during specific seasons or campaigns.

To analyze trends, create a histogram of the ‘dateCreated’ column after converting it into a datetime object:

def toDateTime(date):

    date = date.replace('a ','1 ')

    value = int(date.split()[0])
    unit = date.split()[1]

    unit_mapping = {
        'months': relativedelta(months=value),
        'days' : relativedelta(days=value),
        'weeks' : relativedelta(weeks=value),
        'years' : relativedelta(years=value),
        'month': relativedelta(months=value),
        'day' : relativedelta(days=value),
        'week' : relativedelta(weeks=value),
        'year' : relativedelta(years=value)
    }

    newDate = datetime.datetime.now() - unit_mapping[unit]
    
    return newDate

This function converts relative dates formatted as ‘<number> <unit> ago,’ into absolute datetime objects, which are easy to analyze.

You can apply this function to a column using Pandas’ apply() method.

positive.loc[:,'createdDate'] = positive['dateCreated'].apply(toDateTime)

Now that you have a new column with absolute date values, you can create a histogram representing trends using Pandas’ hist() method:

positive['createdDate'].hist(grid=False,xrot=45)

7. Topic Analysis

You can analyze reviews to identify common themes that help you understand which specific features of your product customers appreciate or dislike.

A straightforward method for searching for specific keywords within reviews:

def classify_topic(review):
    if 'food' in review.lower():
        return 'Food'
    elif 'service' in review.lower():
        return 'Service'
    elif 'atmosphere' in review.lower():
        return 'Atmosphere'
    return 'Other'

positive['topic'] = positive['reviewBody'].apply(classify_topic)

However, this method only classifies reviews into known topics; it can’t discover new ones. Latent Dirichlet Allocation (LDA) is an advanced technique that can discover previously unknown topics.

Here’s an article on analyzing Amazon reviews using LDA topic modeling.

8. Keyword Extraction

Keyword extraction allows you to identify frequently mentioned keywords and phrases useful for SEO purposes.

Here’s how you can extract keywords from reviews:

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(stop_words='english')
keywordMatrix = vectorizer.fit_transform(positive['reviewBody'].dropna())

# Get feature names and their counts
keywordDataFrame = pandas.DataFrame(keywordMatrix.toarray(), columns = vectorizer.get_feature_names_out())
occurrences = keywordDataFrame.sum().sort_values(ascending=False)

print(occurrences.head(10))

The code first vectorizes the reviews and creates a matrix representation of words and their occurrences while ignoring stopwords.

It then converts this matrix into a Pandas DataFrame for easier counting of total occurrences per keyword using .sum().

9. Competitor Benchmarking

Compare your review sentiments and themes against those of competitors to identify strengths and weaknesses in your offerings.

competitor_data = pandas.read_json('competitorReviews.json')  # Load competitor reviews

# Merge or concatenate with your existing DataFrame for comparison
combined_data = pandas.concat([df, competitor_data], keys=[‘self', 'Competitor'])

Now, you can perform all the operations on this combined DataFrame containing both sets of data, allowing easy comparison using keys.

10. Visualization

Data visualization effectively communicates the results. For instance, you can use Python libraries for data visualization, like Matplotlib or Seaborn, to visualize your results as bar graphs, line graphs, etc.

import matplotlib.pyplot as plt
import seaborn as sns

# Visualize sentiment distribution
sns.countplot(x='sentiment', data=df)
plt.title('Sentiment Distribution')
plt.show()

The above code visualizes counts of positive, negative, and neutral sentiments.

You may also create a word cloud for extracted keywords to visualize their frequency.

from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Assuming the variable text contains the review text

# Generate a word cloud using wordcloud.
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)

# Plot the wordcloud using matplotlib.
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud)
plt.show()

Want to look at an example of an analysis? Check out this article on Google review analysis of Walmart.

Tips for Analyzing Google Reviews

Here are some tips for analyzing Google reviews:

Focus on negative reviews: Focussing on negative reviews helps identify areas needing improvement; addressing these issues enhances customer experience and reduces negative feedback over time.
Ensure data quality and authenticity: For accurate results, it’s crucial to analyze well-cleaned datasets; missing values or duplicates compromise insights during sentiment or descriptive analyses.
Use a Data Pipeline: Since the workflow is generally similar across analyses, employing a pipeline that gathers data, performs analysis, and visualizes the results prevents unnecessary coding errors.
Use Multiple Models for Sentiment Analysis: Although high-quality AI models can figure out the sentiment of Google reviews, they struggle with contextual understanding. Therefore, it is better to compare the sentiment analysis results with multiple AI models.

Not sure how Google reviews can help you? Read this article on scraping Google reviews.

Get Reviews Using ScrapeHero Cloud

The Google Reviews Scraper from ScrapeHero Cloud, with just a few clicks, allows efficient retrieval of reviews. Here are the steps:

Create an account on ScrapeHero Cloud.
Go to the app store and find Google Review Scraper.
Click ‘Create New Project’
Enter Google review URLs or place IDs
Select the sort type: Most relevant, Newest, Highest Rating, Lowest Rating
Click ‘Gather Data’

Wait until the scraper finishes downloading Google reviews; then:

Go to ‘My Projects’ under ‘Projects’
Select the project
Click download

Optionally, you can also schedule the scraper and have the data delivered to you.

How Can a Web Scraping Service Help You?

You should now have a solid understanding of how to analyze Google reviews: collecting, processing, and visualizing the results. However, you don’t need to scrape and analyze Google reviews on your own. A web scraping service can help with data collection.

A service like ScrapeHero can provide top-notch Google reviews, allowing you to set aside the technicalities of data collection and focus on the analyses. ScrapeHero is a fully managed web scraping service provider capable of building enterprise-grade web scrapers. Our services also include large-scale crawling and custom RPA solutions.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data

Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help