Understand your customer better with Google Reviews.
Analyzing Google reviews is an effective way to gain insights into customer experiences and preferences. This guide explains how to analyze Google reviews using Python, detailing the tools and techniques necessary for data interpretation.
You will learn to use libraries like Pandas and NLTK and visualize your results using tools like Matplotlib.
Steps to Analyze Google Reviews
1. Data Collection
For your business, you can use Google My Business API to get reviews. However, you need to scrape Google Maps to get reviews of other companies.
You can scrape Google reviews in three ways:
1. Writing Code: You can write a script in a programming language that fetches and parses HTML code. While most programming languages can be used for this purpose, Python is most popular because of its extensive community support and numerous dedicated scraping libraries. Every web scraping code has three parts:
- Fetching the HTML code using libraries like Python’s requests, Urllib
- Parsing the HTML code to extract the necessary data with parsers like BeautifulSoup or lxml
- Saving the extracted data with modules like json and csv
2. Using Ready-Made Scrapers: You can use ready-made scrapers like Google Review Scraper from ScrapeHero Cloud, which offers a no-code solution. Within just a few clicks, you can obtain the required data:
- Enter the review URLs
- Save the settings and click ‘Gather Data’
- Download the scraped data
3. Using Web Scraping APIs: You can also use a web scraping API that connects to a ready-made web scraper mentioned above. A web scraping API integrates more seamlessly with your workflow than directly using a ready-made scraper and is more straightforward than writing your own code. An example is the Google Review Scraper API from ScrapeHero Cloud:
- Sign up for a premium membership on ScrapeHero Cloud and get your API key
- Make an HTTP request to the API with the review URL or place ID along with your key
- Save the required data from the response
This article on how to scrape Google Reviews provides detailed explanations of all these methods.
Once you have the reviews, you can use either a programming language to analyze them or opt for a ready-made Google review analyzer. This guide will tell you how to analyze Google reviews using Python.
2. Data Loading
You can use Pandas to load the data, with the method depending on the data format; for instance,
- Use read_csv() to read CSV files
import pandas
df = pandas.read_csv(‘reviews.csv’)
- Use read_json() to read JSON files
df = pandas.read_json(‘reviews.json’)
3. Data Cleaning
Cleaning your data before analysis is essential for ensuring high-quality results. Duplicate entries and missing values can lead to inaccurate conclusions.
Additionally, for sentiment analysis, you must prepare the text efficiently by removing stop words or converting it to lowercase.
Here are several methods for cleaning the extracted Google reviews:
- Remove Duplicates
- Fix Missing Values
- Remove Non-Alphanumeric Characters
- Convert review text to lowercase
- Remove stop words
Remove Duplicates
You can remove duplicates using the drop_duplicates() method.
df.drop_duplicates()
Fix Missing Values
There are two common approaches for handling missing values:
- Dropping missing values using dropna()
df.dropna()
- Filling with a constant value
df.fillna(0)
Convert to Lowercase
Converting to lowercase ensures that AI models do not differentiate between words that differ only in case. You can convert text using str.lower():
df[‘reviewBody’’] = df[‘reviewBody’].str.lower()
Remove Non-Alphanumeric Characters
Non-Alphanumeric Characters are unnecessary for sentiment analysis, removing them improves efficiency:
cleaned_text = ''.join(char for char in reviewText if char.isalnum() or char == ' ')
Remove Stopwords
To remove stopwords, you can use the NLTK library, which is a toolkit for Natural Language Processing.
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
tokenized_review = word_tokenize(cleaned_text)
stop_words = set(stopwords.words('english'))
filtered_review = [word for word in tokenized_review if word not in stop_words]
This code first creates tokens from the review text, which are easy-to-process parts, usually words. It then retrieves stop words included in the NLTK library to filter out from the tokenized review.
4. Sentiment Analysis
Sentiment analysis determines whether a review is positive, negative, or neutral, letting you categorize and analyze Google customer reviews individually. By studying them separately, you gain deeper insights. For instance, analyzing negative reviews reveals weaknesses, while positive reviews highlight strengths.
from nltk.sentiment import SentimentIntensityAnalyzer
# Initialize the sentiment analyzer
sia = SentimentIntensityAnalyzer()
# Function to classify sentiment
def classify_sentiment(review):
score = sia.polarity_scores(review)['compound']
if score >= 0.05:
return 'positive'
elif score <= -0.05:
return 'negative'
else:
return 'neutral'
# Apply sentiment classification
data['sentiment'] = data['reviewBody'].apply(classify_sentiment)
# Separate analyses based on sentiment
positive_reviews = data[data['sentiment'] == 'positive']
negative_reviews = data[data['sentiment'] == 'negative']
neutral_reviews = data[data['sentiment'] == 'neutral']
The above code uses SentimentIntensityAnalyzer from NLTK to assess whether a review is positive, negative, or neutral and separates them accordingly for further analysis.
5. Descriptive Statistics
You can perform descriptive analysis on your Google review dataset. The analysis will summarize your dataset using metrics like review counts and average ratings.
Pandas has a describe() method that lets you perform descriptive analysis very quickly.
positive.describe()
6. Trend Analysis
After categorizing the reviews, you can track the number of positive and negative reviews over time. Ideally, positive reviews should increase while negative ones decrease; you may also observe the variations during specific seasons or campaigns.
To analyze trends, create a histogram of the ‘dateCreated’ column after converting it into a datetime object:
def toDateTime(date):
date = date.replace('a ','1 ')
value = int(date.split()[0])
unit = date.split()[1]
unit_mapping = {
'months': relativedelta(months=value),
'days' : relativedelta(days=value),
'weeks' : relativedelta(weeks=value),
'years' : relativedelta(years=value),
'month': relativedelta(months=value),
'day' : relativedelta(days=value),
'week' : relativedelta(weeks=value),
'year' : relativedelta(years=value)
}
newDate = datetime.datetime.now() - unit_mapping[unit]
return newDate
This function converts relative dates formatted as ‘<number> <unit> ago,’ into absolute datetime objects, which are easy to analyze.
You can apply this function to a column using Pandas’ apply() method.
positive.loc[:,'createdDate'] = positive['dateCreated'].apply(toDateTime)
Now that you have a new column with absolute date values, you can create a histogram representing trends using Pandas’ hist() method:
positive['createdDate'].hist(grid=False,xrot=45)
7. Topic Analysis
You can analyze reviews to identify common themes that help you understand which specific features of your product customers appreciate or dislike.
A straightforward method for searching for specific keywords within reviews:
def classify_topic(review):
if 'food' in review.lower():
return 'Food'
elif 'service' in review.lower():
return 'Service'
elif 'atmosphere' in review.lower():
return 'Atmosphere'
return 'Other'
positive['topic'] = positive['reviewBody'].apply(classify_topic)
However, this method only classifies reviews into known topics; it can’t discover new ones. Latent Dirichlet Allocation (LDA) is an advanced technique that can discover previously unknown topics.
8. Keyword Extraction
Keyword extraction allows you to identify frequently mentioned keywords and phrases useful for SEO purposes.
Here’s how you can extract keywords from reviews:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
keywordMatrix = vectorizer.fit_transform(positive['reviewBody'].dropna())
# Get feature names and their counts
keywordDataFrame = pandas.DataFrame(keywordMatrix.toarray(), columns = vectorizer.get_feature_names_out())
occurrences = keywordDataFrame.sum().sort_values(ascending=False)
print(occurrences.head(10))
The code first vectorizes the reviews and creates a matrix representation of words and their occurrences while ignoring stopwords.
It then converts this matrix into a Pandas DataFrame for easier counting of total occurrences per keyword using .sum().
9. Competitor Benchmarking
Compare your review sentiments and themes against those of competitors to identify strengths and weaknesses in your offerings.
competitor_data = pandas.read_json('competitorReviews.json') # Load competitor reviews
# Merge or concatenate with your existing DataFrame for comparison
combined_data = pandas.concat([df, competitor_data], keys=[‘self', 'Competitor'])
Now, you can perform all the operations on this combined DataFrame containing both sets of data, allowing easy comparison using keys.
10. Visualization
Data visualization effectively communicates the results. For instance, you can use Python libraries for data visualization, like Matplotlib or Seaborn, to visualize your results as bar graphs, line graphs, etc.
import matplotlib.pyplot as plt
import seaborn as sns
# Visualize sentiment distribution
sns.countplot(x='sentiment', data=df)
plt.title('Sentiment Distribution')
plt.show()
The above code visualizes counts of positive, negative, and neutral sentiments.
You may also create a word cloud for extracted keywords to visualize their frequency.
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Assuming the variable text contains the review text
# Generate a word cloud using wordcloud.
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)
# Plot the wordcloud using matplotlib.
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud)
plt.show()
Tips for Analyzing Google Reviews
Here are some tips for analyzing Google reviews:
- Focus on negative reviews: Focussing on negative reviews helps identify areas needing improvement; addressing these issues enhances customer experience and reduces negative feedback over time.
- Ensure data quality and authenticity: For accurate results, it’s crucial to analyze well-cleaned datasets; missing values or duplicates compromise insights during sentiment or descriptive analyses.
- Use a Data Pipeline: Since the workflow is generally similar across analyses, employing a pipeline that gathers data, performs analysis, and visualizes the results prevents unnecessary coding errors.
- Use Multiple Models for Sentiment Analysis: Although high-quality AI models can figure out the sentiment of Google reviews, they struggle with contextual understanding. Therefore, it is better to compare the sentiment analysis results with multiple AI models.
Not sure how Google reviews can help you? Read this article on scraping Google reviews.
Get Reviews Using ScrapeHero Cloud
The Google Reviews Scraper from ScrapeHero Cloud, with just a few clicks, allows efficient retrieval of reviews. Here are the steps:
- Create an account on ScrapeHero Cloud.
- Go to the app store and find Google Review Scraper.
- Click ‘Create New Project’
- Enter Google review URLs or place IDs
- Select the sort type: Most relevant, Newest, Highest Rating, Lowest Rating
- Click ‘Gather Data’
Wait until the scraper finishes downloading Google reviews; then:
- Go to ‘My Projects’ under ‘Projects’
- Select the project
- Click download
Optionally, you can also schedule the scraper and have the data delivered to you.
How Can a Web Scraping Service Help You?
You should now have a solid understanding of how to analyze Google reviews: collecting, processing, and visualizing the results. However, you don’t need to scrape and analyze Google reviews on your own. A web scraping service can help with data collection.
A service like ScrapeHero can provide top-notch Google reviews, allowing you to set aside the technicalities of data collection and focus on the analyses. ScrapeHero is a fully managed web scraping service provider capable of building enterprise-grade web scrapers. Our services also include large-scale crawling and custom RPA solutions.
We can help with your data or automation needs
Turn the Internet into meaningful, structured and usable data