Scrape product details from Overstock.com based on parameters like price, color, style, brands, and customer ratings using web scraper chrome extension
Google Maps has a highly dynamic website, which makes scraping POI (Points of Interest) data challenging. But you can scrape Google Maps POI data. You can use Python and Selenium to navigate Google Maps, render JavaScript, and extract the necessary data.
This tutorial shows you how to scrape POI data from Google Maps.
Data Scraped From Google Maps
The tutorial scrapes POI data from Google Maps across six categories:
- Banks
- Car Washes
- Clinics
- Stores
- Hotels
- Pharmacies
For each point of interest, the code extracts six data points:
- Name
- Rating
- Review Count
- Address
- Phone Number
- Website
You need to analyze the HTML code of Google Maps’s SERP to find unique ways to locate these data points. Once you do that, you can begin setting up the environment.
Scrape Google Maps POI Data: The Environment
The tutorial requires three external libraries to scrape POI data that you must install using Python pip:
- Selenium: Enables interaction with web pages, execution of JavaScript, and data extraction.
- BeautifulSoup: Offers intuitive methods for extracting data from HTML code
- Geopy: Provides latitude and longitude for a given location
pip install selenium beautifulsoup4 geopy
Scrape Google Maps POI Data: The Code
1. Import Packages
Start by importing necessary modules or classes from the aforementioned packages.
from selenium import webdriver
from selenium.webdriver.common.by import By
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup
import json, time
In this code snippet:
- webdriver: Controls the Selenium browser.
- By: Specifies the selector type for data extraction.
- Nominatim: Retrieves latitude and longitude for a location.
- json: Saves the extracted data as a JSON file.
- time: Provides the sleep() function that pauses the script execution for a specified duration.
2. Define functions
Define three functions:
- getElements(): Returns the HTML code of the elements containing POIs.
- extractDetails(): Extracts required data points from the HTML elements.
- getData(): Calls the above two functions and saves the extracted data as a JSON file.
Let’s look at the functions in detail.
getElements()
The function takes a category, latitude, and longitude as inputs and returns an array containing the HTML code of POI listings for that specific location.
Begin by launching the Selenium browser with defined options:
browser = webdriver.Chrome(options=options)
Construct the URL of the page containing POI listings, which includes the category, latitude, and longitude:
url = f"https://www.google.com/maps/search/{category}/@{lat},{long}"
Navigate to the URL using get() method of the Selenium webdriver. Pause execution for 3 seconds to ensure all required elements are loaded:
browser.get(url)
time.sleep(3)
Locate the div element containing the listings to find the elements holding the POI data:
results = browser.find_element(By.XPATH,f'//div[@aria-label="Results for {category}"]')
Since the page uses lazy-loading to load the POI elements, you need to scroll. Set an upper limit on scrolls while ensuring at least ten elements are loaded:
listings = results.find_elements(By.CLASS_NAME,"lI9IFe")
linkCount = len(listings)
i = 1
while(linkCount<=10 and i < 20):
try:
browser.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", results)
listings = list(map(lambda x: x.get_attribute('outerHTML'), results.find_elements(By.CLASS_NAME,"lI9IFe")))
linkCount = len(listings)
i+=1
except Exception as e:
print(e)
break
The loop continues until either more than ten listings are extracted or more than twenty scrolls have been performed.
Each iteration stores the listing’s HTML code in an array, which is returned after the loop completes:
return listings
extractDetails()
This function extracts necessary data from the HTML elements obtained via getElements(). It accepts a dict of extracted HTML elements, loops through them, retrieves the data points, and returns another dict containing with extracted POI data.
Here is how it looks:
def extractDetails(data):
places_of_interest = {}
for info in data:
category_data = []
for d in data[info]:
soup = BeautifulSoup(d)
try:
url = soup.find('div',{'class':'Rwjeuc'}).a['href']
except:
url = "Not Available"
name = soup.find('div',{'class':'qBF1Pd'}).text
try:
rating = soup.find('span',{'class':'MW4etd'}).text
except:
rating = 'Not Available'
try:
review_count = soup.find('span',{'class':'UY7F9'}).text.replace('(','').replace(')','')
except:
review_count = 'Not Available'
details = soup.find_all('div',{'class':'W4Efsd'})
try:
address = details[2].text.split('·')[2]
except:
address = 'Not Available'
try:
phone = details[3].text.split('·')[1]
except:
phone = 'Not Available'
all_details = {
'Name':name,
'Rating':rating,
'Review Count':review_count,
'Address':address,
'Phone':phone,
'Website':url
}
category_data.append(all_details)
places_of_interest[info] = category_data
return places_of_interest
This code initializes an empty dict to store all POI data. This data will hold the POI data across categories.
It iterates through each key in the dict,
1. Defining an empty array for one category’s POI data.
2. Looping through the HTML elements in that category, where each loop
- Parses the element with BeautifulSoup
- Extracts required details
- Saves them in a dict
- Appends the dict to the array defined earlier
3. Updating the main dict with category names as keys and extracted data as values.
Finally, the function returns the dict containing the extracted POI data.
getData()
This function integrates getElements() and extractDetails().
Start by prompting the user for a location using input().
search = input(‘enter a place’)
Next, use Geopy to get the latitude and longitude.
geolocator = Nominatim(user_agent='poi')
location = geolocator.geocode(search)
lat = location.latitude
long = location.longitude
Create an array of categories that will be used to construct the URLs:
categories = ['banks', 'car washes', 'clinics', 'stores', 'hotels', 'Pharmacies']
Iterate through these categories and call getElements() in each iteration to collect HTML elements into a dictionary.
poi_data = {}
for category in categories:
poi_data[category] = getElements(category,lat,long)
print(f'{category} data extracted')
Pass the dict to extractDetails(), which returns another dict containing extracted POI across all categories.
details = extractDetails(poi_data)
Pass the dict to extractDetails(), which returns another dict containing extracted POI across all categories.
details = extractDetails(poi_data)
Finally, save this extracted data into a JSON file.
with open(f'{search}_poi.json','w',encoding='utf-8') as f:
json.dump(details, f, indent=4, ensure_ascii=False)
You can now run the complete script by calling getData():
if __name__ == "__main__":
getData()
The results from extracting Google Maps POI data will resemble this format.
{
"Name": "Valley Bank ATM",
"Rating": "4.1",
"Review Count": "54",
"Address": " 211 Main Ave",
"Phone": " (973) 777-6441",
"Website": "https://locations.valley.com/nj/passaic/valley-bank-2a.html"
}
Complete Code Example
Here’s the entire code to extract Google Maps POI data.
from selenium import webdriver
from selenium.webdriver.common.by import By
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup
import json, time
options = webdriver.ChromeOptions()
options.add_argument("--headless=new")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36")
def getElements(category, lat, long):
browser = webdriver.Chrome(options=options)
url = f"https://www.google.com/maps/search/{category}/@{lat},{long}"
browser.get(url)
time.sleep(3)
try:
results = browser.find_element(By.XPATH,f'//div[@aria-label="Results for {category}"]')
except:
print(url)
listings = results.find_elements(By.CLASS_NAME,"lI9IFe")
linkCount = len(listings)
i = 1
while(linkCount<=10 and i < 20):
try:
browser.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", results)
listings = list(map(lambda x: x.get_attribute('outerHTML'), results.find_elements(By.CLASS_NAME,"lI9IFe")))
linkCount = len(listings)
i+=1
except Exception as e:
print(e)
break
return listings
def extractDetails(data):
places_of_interest = {}
for info in data:
category_data = []
for d in data[info]:
soup = BeautifulSoup(d)
try:
url = soup.find('div',{'class':'Rwjeuc'}).a['href']
except:
url = "Not Available"
name = soup.find('div',{'class':'qBF1Pd'}).text
try:
rating = soup.find('span',{'class':'MW4etd'}).text
except:
rating = 'Not Available'
try:
review_count = soup.find('span',{'class':'UY7F9'}).text.replace('(','').replace(')','')
except:
review_count = 'Not Available'
details = soup.find_all('div',{'class':'W4Efsd'})
try:
address = details[2].text.split('·')[2]
except:
address = 'Not Available'
try:
phone = details[3].text.split('·')[1]
except:
phone = 'Not Available'
all_details = {
'Name':name,
'Rating':rating,
'Review Count':review_count,
'Address':address,
'Phone':phone,
'Website':url
}
category_data.append(all_details)
places_of_interest[info] = category_data
return places_of_interest
def getData():
search = input('enter a place')
print('Decoding lattitude and longitude')
geolocator = Nominatim(user_agent='poi')
location = geolocator.geocode(search)
lat = location.latitude
long = location.longitude
categories = ['banks', 'car washes', 'clinics', 'stores', 'hotels', 'Pharmacies']
print('Commensing extraction')
poi_data = {}
for category in categories:
poi_data[category] = getElements(category,lat,long)
print(f'{category} data extracted')
#data = dict(zip(categories,poi_data))
details = extractDetails(poi_data)
print('Extraction completed')
with open(f'{search}_poi.json','w',encoding='utf-8') as f:
json.dump(details, f, indent=4, ensure_ascii=False)
if __name__ == "__main__":
getData()
Code Limitations
While this tutorial demonstrates how to scrape Google Maps POI data effectively, there are limitations:
- It is not suitable for large-scale web scraping since it lacks techniques to bypass anti-scraping measures.
- You must monitor changes in Google Maps’ HTML structure; any alterations will require updates to your code to avoid breaking functionality.
- The code only extracts six data points; if you want more, you’ll need to modify the code further.
Alternative POI Sources: ScrapeHero Cloud and Datastore
If you prefer not to code yourself, consider using ScrapeHero’s alternative sources for POI data through its Cloud and Datastore.
ScrapeHero Cloud
ScrapeHero Cloud is a web scraping platform that offers no-code web scrapers. Its Google Maps Search Results Scraper allows you to quickly gather POI data with just a few clicks.
To use this scraper for Google Maps POI data, follow these steps:
- Sign up for ScrapeHero Cloud
- Create a new project
- Name the Project
- Enter the search queries
- Click ‘Gather Data’
- Download the data when finished
ScrapeHero Datastore
ScrapeHero Datastore simplifies the process even further by directly providing POI data. You can easily obtain high-quality data by:
- Visiting ScrapeHero datastore
- Adding the desired data to your cart
- Navigating to your cart
- Completing the payment process
Why Use ScrapeHero’s Web Scraping Service?
By coding yourself, scraping a few dozen POIs might be manageable, but large-scale scraping with thousands of PoIs across multiple locations becomes more complex. This is where ScrapeHero’s fully managed web scraping service comes into play.
ScrapeHero offers a comprehensive service that handles the entire scraping process for you.
We provide custom solutions to handle dynamic websites like Google Maps for large-scale projects. You can forget about managing proxies, CAPTCHAs, or any other complexities associated with scraping protected sites.
We can help with your data or automation needs
Turn the Internet into meaningful, structured and usable data