Learn how you can use Selenium to scrape data from the data layer of Google Tag Manager.
Google Trends is a tool that analyzes the popularity of top search queries in Google Search across various regions and languages using real-time data.
By scraping Google Trends, you can get details on the latest search patterns and trends, which can help you make decisions.
This article is a step-by-step guide on scraping Google Trends data by building a custom scraper using Playwright.
We also discuss how you can scrape with Pytrends, an unofficial API to scrape Google Trends.
Scraping Google Trends Data Using Playwright
Initial Requirements
Before you begin scraping Google Trends with Playwright, make sure that you install the required dependencies.
- Install the Playwright library along with the Pytest plugin for testing automation
pip install pytest-playwright
- Install necessary browsers such as Chromium, Firefox, or WebKit for Playwright to automate
playwright install
- Install lxml library for parsing and manipulating HTML/XML in Python
pip install lxml
Building a Custom Scraper With Playwright to Scrape Google Trends
- Import necessary libraries to handle CSV files, asynchronous operations, HTML parsing, and browser automation using Playwright
import csv
import asyncio
from lxml import html
from playwright.async_api import Playwright, async_playwright
- Define an asynchronous function run that uses Playwright to control the browser
async def run(playwright: Playwright) -> None:
- Launch a Chromium browser in non-headless mode to create a new browser context, and open a new page (tab) for web navigation
browser = await playwright.chromium.launch(headless=False)
context = await browser.new_context()
page = await context.new_page()
- Now navigate to the Google Trends page and wait for the page to load fully
await page.goto("https://trends.google.com/trending?geo=US")
await page.wait_for_load_state(timeout=30000)
- Fetch the HTML content of the page and parse it using lxml to extract relevant elements
response = await page.content()
print(len(response))
tree = html.fromstring(response)
- Use XPath to locate all the rows that contain trending topics and print the number of rows found
rows = tree.xpath('//tr[@role="row"]')
print(len(rows))
- Now, you can extract data from each row and append it to the data list for further processing
data = []
for element in rows:
row_data = element.xpath('./td/div/text()')
row_data.append(element.xpath('./td/div/div/text()')[0])
data.append(row_data)
- Define the CSV column titles and write the extracted data into a CSV file (trending_topics.csv)
titles = ['trends', 'started', 'volume']
with open('trending_topics.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(titles)
writer.writerows(data)
- Define and run the main function that initializes Playwright and execute the run function asynchronously
async def main() -> None:
async with async_playwright() as playwright:
await run(playwright)
asyncio.run(main())
Complete Code to Scrape Google Trends Using Playwright
import csv
import asyncio
from lxml import html
from playwright.async_api import Playwright, async_playwright
async def run(playwright: Playwright) -> None:
"""
Launches a Chromium browser and extracts trending topics from Google Trends.
"""
browser = await playwright.chromium.launch(headless=False)
# Create a new browser context
context = await browser.new_context()
# Open a new page in the browser context
page = await context.new_page()
# Navigate to the Google Trends page for the US
await page.goto("https://trends.google.com/trending?geo=US")
await page.wait_for_load_state(timeout=30000)
# Get the page content as a string
response = await page.content()
print(len(response))
# Parse the HTML content using lxml
tree = html.fromstring(response)
# Extract all row elements containing trending topics
rows = tree.xpath('//tr[@role="row"]')
print(len(rows))
# Initialize a list to store the extracted data
data = []
# Iterate over each row and extract relevant data
for element in rows:
row_data = element.xpath('./td/div/text()')
row_data.append(element.xpath('./td/div/div/text()')[0])
data.append(row_data)
# Define the column titles for the CSV
titles = ['trends', 'started', 'volume']
# Write the extracted data to a CSV file
with open('trending_topics.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(titles)
writer.writerows(data)
# Close the browser context and browser
await context.close()
await browser.close()
async def main() -> None:
"""
Runs the Playwright script.
"""
async with async_playwright() as playwright:
await run(playwright)
asyncio.run(main())
Scraping Google Trends Data With PyTrends
You can also scrape Google Trends data with Pytrends, an unofficial API wrapper for Google Trends.
Pytrends allows you to access the trend data without needing browser automation, simplifying querying and retrieving data directly in a structured format.
- Install pytrends and run the given command in your terminal
pip install pytrends
- Import pytrends and pandas
from pytrends.request import TrendReq
import pandas as pd
Here, TrendReq, which is a class from Pytrends, allows you to connect to Google Trends.
- Set up Pytrends connection to start interacting with Google Trends
pytrends = TrendReq(hl='en-US', tz=360)
Note that hl=’en-US’ is the language of the results, which is English in the United States, and tz=360 sets the timezone offset from GMT.
- Define the keywords you want to search for and fetch data
keywords = ['Python', 'Java', 'JavaScript']
pytrends.build_payload(keywords, cat=0, timeframe='today 12-m', geo='', gprop='')
The build_payload mentioned here is the method to set up the search. The parameters are:
- cat=0 (category of interest, 0 means all categories) ,
- timeframe=’today 12-m’ (data for the past 12 months),
- geo=” (empty means global data),
- gprop=” (empty means no specific Google property, like ‘news’ or ‘images’).
- Retrieve the interest over time data for the specified keywords
interest_over_time_df = pytrends.interest_over_time()
- Print the data
print(interest_over_time_df.head())
You can have a quick look at the results with head() displaying the first few rows of the data.
- Save the data to a CSV file for later use
interest_over_time_df.to_csv('google_trends_data.csv')
Complete Code to Scrape Google Trends With Pytrends
from pytrends.request import TrendReq
import pandas as pd
# Set up Pytrends connection
pytrends = TrendReq(hl='en-US', tz=360)
# Define keywords and fetch data
keywords = ['Python', 'Java', 'JavaScript']
pytrends.build_payload(keywords, cat=0, timeframe='today 12-m', geo='', gprop='')
# Retrieve interest over time
interest_over_time_df = pytrends.interest_over_time()
# Display data
print(interest_over_time_df.head())
# Save data to CSV (optional)
interest_over_time_df.to_csv('google_trends_data.csv')
Why Should You Scrape Data From Google Trends?
Scraping Google Trends data is valuable as it is an excellent data source that serves various purposes, especially for businesses, marketers, and researchers.
Some of the key reasons why you should scrape Google Trends are:
- Market Research
- Product Development
- Geographic Insights
- Performance Tracking
- Data-Driven Decisions
1. Market Research
Google Trends data can help you understand the emerging trends in consumer behavior and compare your search interest with that of your competitors.
2. Product Development
By scraping Google Trends, you can identify consumer preferences related to products or services and gather insights into what users are looking for.
Do you want to gain an edge over your competitors by monitoring competitor products in multiple countries and markets? Then you can try out our Price Monitoring Service.
3. Geographic Insights
With Google Trends data, you can discover geographic regions with higher search interest for specific keywords and adjust your marketing strategies based on this information.
4. Performance Tracking
You can track the changing search interest in your keywords over time and compare the performance of your products against industry benchmarks with Google Trends data.
5. Data-Driven Decisions
Google Trends data helps you make data-driven decisions based on real-time and historical search trends and plan long-term business strategies.
Wrapping Up
Google Trends is a valuable tool for understanding customer behavior and expectations.
Enterprises can use the data extracted from Google Trends to gain a competitive edge in the market. The code explained in this article is ideal for small-scale data extraction.
If you are an enterprise with more extensive data requirements, consider a full web scraping service. You are more likely to encounter different challenges with web scraping, including anti-scraping technologies and legal issues.
For enterprises with significant data needs, a dependable data partner like ScrapeHero is essential. Our custom web scrapers are designed to overcome the challenges of web scraping and ensure the job is done to perfection.
We are an enterprise-grade web scraping service with a 98% customer retention rate. We provide high-quality data services and fulfill all our customers’ data needs.
Frequently Asked Questions
The legality of web scraping depends on several factors: the use of data, adherence to privacy laws, respect for website terms of service, and the impact on website performance.
Using the Pytrends library, you create a Google Trends scraper in Python, which allows you to fetch trend data programmatically.
Google does not officially provide the Google Trends API. Pytrends is an unofficial API that allows you to query Google Trends data through Python.
Google doesn’t explicitly state the rate limits for Trends. However, an increase in the number of queries can lead to temporary blocks.
We can help with your data or automation needs
Turn the Internet into meaningful, structured and usable data