web scraping

8 min read

Need for Points of Interest? Here’s How to Scrape Google Maps POI Data

Matthew
Published: December 2, 2024

Data Scraped From Google Maps
Scrape Google Maps POI Data: The Environment
Scrape Google Maps POI Data: The Code
Code Limitations
Alternative POI Sources: ScrapeHero Cloud and Datastore
Why Use ScrapeHero’s Web Scraping Service?

Google Maps has a highly dynamic website, which makes scraping POI (Points of Interest) data challenging. But you can scrape Google Maps POI data. You can use Python and Selenium to navigate Google Maps, render JavaScript, and extract the necessary data.

This tutorial shows you how to scrape POI data from Google Maps.

Data Scraped From Google Maps

The tutorial scrapes POI data from Google Maps across six categories:

Banks
Car Washes
Clinics
Stores
Hotels
Pharmacies

For each point of interest, the code extracts six data points:

Name
Rating
Review Count
Address
Phone Number
Website

You need to analyze the HTML code of Google Maps’s SERP to find unique ways to locate these data points. Once you do that, you can begin setting up the environment.

As mentioned above, this tutorial uses Selenium for data scraping; if you want to use Playwright, read this article on how to scrape Google Maps.

Scrape Google Maps POI Data: The Environment

The tutorial requires three external libraries to scrape POI data that you must install using Python pip:

Selenium: Enables interaction with web pages, execution of JavaScript, and data extraction.
BeautifulSoup: Offers intuitive methods for extracting data from HTML code
Geopy: Provides latitude and longitude for a given location

pip install selenium beautifulsoup4 geopy

Want to learn more about scraping with Selenium? Read our article on how to scrape a dynamic website.

Scrape Google Maps POI Data: The Code

1. Import Packages

Start by importing necessary modules or classes from the aforementioned packages.

from selenium import webdriver
from selenium.webdriver.common.by import By
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup

import json, time

In this code snippet:

webdriver: Controls the Selenium browser.
By: Specifies the selector type for data extraction.
Nominatim: Retrieves latitude and longitude for a location.
json: Saves the extracted data as a JSON file.
time: Provides the sleep() function that pauses the script execution for a specified duration.

2. Define functions

Define three functions:

getElements(): Returns the HTML code of the elements containing POIs.

extractDetails(): Extracts required data points from the HTML elements.
getData(): Calls the above two functions and saves the extracted data as a JSON file.

Let’s look at the functions in detail.

getElements()

The function takes a category, latitude, and longitude as inputs and returns an array containing the HTML code of POI listings for that specific location.

Begin by launching the Selenium browser with defined options:

browser = webdriver.Chrome(options=options)

Construct the URL of the page containing POI listings, which includes the category, latitude, and longitude:

url = f"https://www.google.com/maps/search/{category}/@{lat},{long}"

Navigate to the URL using get() method of the Selenium webdriver. Pause execution for 3 seconds to ensure all required elements are loaded:

browser.get(url)
time.sleep(3)

Locate the div element containing the listings to find the elements holding the POI data:

results = browser.find_element(By.XPATH,f'//div[@aria-label="Results for {category}"]')

Since the page uses lazy-loading to load the POI elements, you need to scroll. Set an upper limit on scrolls while ensuring at least ten elements are loaded:

listings = results.find_elements(By.CLASS_NAME,"lI9IFe")

linkCount = len(listings)

i = 1

while(linkCount&lt;=10 and i &lt; 20):

    try:
        browser.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", results)

        listings = list(map(lambda x: x.get_attribute('outerHTML'), results.find_elements(By.CLASS_NAME,"lI9IFe")))
        linkCount = len(listings)
        i+=1

    except Exception as e:
        print(e)
        break

The loop continues until either more than ten listings are extracted or more than twenty scrolls have been performed.

Each iteration stores the listing’s HTML code in an array, which is returned after the loop completes:

return listings

extractDetails()

This function extracts necessary data from the HTML elements obtained via getElements(). It accepts a dict of extracted HTML elements, loops through them, retrieves the data points, and returns another dict containing with extracted POI data.

Here is how it looks:

def extractDetails(data):

    places_of_interest = {}

    for info in data:
        category_data = []

        for d in data[info]:  
            soup = BeautifulSoup(d)

            try:
                url = soup.find('div',{'class':'Rwjeuc'}).a['href']
            except:
                url = "Not Available"
            name = soup.find('div',{'class':'qBF1Pd'}).text
            try:
                rating = soup.find('span',{'class':'MW4etd'}).text
            except:
                rating = 'Not Available'
            try:
                review_count = soup.find('span',{'class':'UY7F9'}).text.replace('(','').replace(')','')
            except:
                review_count = 'Not Available'

            details = soup.find_all('div',{'class':'W4Efsd'})

            try:
                address = details[2].text.split('·')[2]
            except:
                address = 'Not Available'
            try:
                phone = details[3].text.split('·')[1]
            except:
                phone = 'Not Available'

            all_details = {
                'Name':name,
                'Rating':rating,
                'Review Count':review_count,
                'Address':address,
                'Phone':phone,
                'Website':url
            }        

            category_data.append(all_details)
        places_of_interest[info] = category_data

    return places_of_interest

This code initializes an empty dict to store all POI data. This data will hold the POI data across categories.

It iterates through each key in the dict,

1. Defining an empty array for one category’s POI data.

2. Looping through the HTML elements in that category, where each loop

Parses the element with BeautifulSoup
Extracts required details
Saves them in a dict
Appends the dict to the array defined earlier

3. Updating the main dict with category names as keys and extracted data as values.

Finally, the function returns the dict containing the extracted POI data.

getData()

This function integrates getElements() and extractDetails().

Start by prompting the user for a location using input().

search = input(‘enter a place’)

Next, use Geopy to get the latitude and longitude.

geolocator = Nominatim(user_agent='poi')
location = geolocator.geocode(search)
lat = location.latitude
long = location.longitude

Create an array of categories that will be used to construct the URLs:

categories = ['banks', 'car washes', 'clinics', 'stores', 'hotels', 'Pharmacies']

Iterate through these categories and call getElements() in each iteration to collect HTML elements into a dictionary.

poi_data = {}
    for category in categories:
        poi_data[category] = getElements(category,lat,long)
        print(f'{category} data extracted')

Pass the dict to extractDetails(), which returns another dict containing extracted POI across all categories.

details = extractDetails(poi_data)

Pass the dict to extractDetails(), which returns another dict containing extracted POI across all categories.

details = extractDetails(poi_data)

Finally, save this extracted data into a JSON file.

with open(f'{search}_poi.json','w',encoding='utf-8') as f:
    json.dump(details, f, indent=4, ensure_ascii=False)

You can now run the complete script by calling getData():

if __name__ == "__main__":
    getData()

The results from extracting Google Maps POI data will resemble this format.

{
            "Name": "Valley Bank ATM",
            "Rating": "4.1",
            "Review Count": "54",
            "Address": " 211 Main Ave",
            "Phone": " (973) 777-6441",
            "Website": "https://locations.valley.com/nj/passaic/valley-bank-2a.html"
        }

Complete Code Example

Here’s the entire code to extract Google Maps POI data.

from selenium import webdriver
from selenium.webdriver.common.by import By
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup

import json, time

options = webdriver.ChromeOptions()
options.add_argument("--headless=new")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36")

def getElements(category, lat, long):

    browser = webdriver.Chrome(options=options)
    url = f"https://www.google.com/maps/search/{category}/@{lat},{long}"


    browser.get(url)
    time.sleep(3)


    try:
        results = browser.find_element(By.XPATH,f'//div[@aria-label="Results for {category}"]')
    except:
        print(url)

    listings = results.find_elements(By.CLASS_NAME,"lI9IFe")
    linkCount = len(listings)
    i = 1

    while(linkCount&lt;=10 and i &lt; 20):

        try:
            browser.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", results)

            listings = list(map(lambda x: x.get_attribute('outerHTML'), results.find_elements(By.CLASS_NAME,"lI9IFe")))
            linkCount = len(listings)
            i+=1
        except Exception as e:
            print(e)
            break  

    return listings


def extractDetails(data):

    places_of_interest = {}


    for info in data:
        category_data = []

        for d in data[info]:  
            soup = BeautifulSoup(d)

            try:
                url = soup.find('div',{'class':'Rwjeuc'}).a['href']
            except:
                url = "Not Available"

            name = soup.find('div',{'class':'qBF1Pd'}).text


            try:
                rating = soup.find('span',{'class':'MW4etd'}).text
            except:
                rating = 'Not Available'
            try:
                review_count = soup.find('span',{'class':'UY7F9'}).text.replace('(','').replace(')','')
            except:
                review_count = 'Not Available'

            details = soup.find_all('div',{'class':'W4Efsd'})

            try:
                address = details[2].text.split('·')[2]
            except:
                address = 'Not Available'
            try:
                phone = details[3].text.split('·')[1]
            except:
                phone = 'Not Available'


            all_details = {
                'Name':name,
                'Rating':rating,
                'Review Count':review_count,
                'Address':address,
                'Phone':phone,
                'Website':url
            }        


            category_data.append(all_details)

        places_of_interest[info] = category_data

    return places_of_interest

def getData():


    search = input('enter a place')

    print('Decoding lattitude and longitude')

    geolocator = Nominatim(user_agent='poi')
    location = geolocator.geocode(search)

    lat = location.latitude
    long = location.longitude

    categories = ['banks', 'car washes', 'clinics', 'stores', 'hotels', 'Pharmacies']

    print('Commensing extraction')

    poi_data = {}
    for category in categories:
        poi_data[category] = getElements(category,lat,long)
        print(f'{category} data extracted')

    #data = dict(zip(categories,poi_data))

    details = extractDetails(poi_data)

    print('Extraction completed')

    with open(f'{search}_poi.json','w',encoding='utf-8') as f:
        json.dump(details, f, indent=4, ensure_ascii=False)

if __name__ == "__main__":
    getData()

Code Limitations

While this tutorial demonstrates how to scrape Google Maps POI data effectively, there are limitations:

It is not suitable for large-scale web scraping since it lacks techniques to bypass anti-scraping measures.
You must monitor changes in Google Maps’ HTML structure; any alterations will require updates to your code to avoid breaking functionality.
The code only extracts six data points; if you want more, you’ll need to modify the code further.

Alternative POI Sources: ScrapeHero Cloud and Datastore

If you prefer not to code yourself, consider using ScrapeHero’s alternative sources for POI data through its Cloud and Datastore.

ScrapeHero Cloud

ScrapeHero Cloud is a web scraping platform that offers no-code web scrapers. Its Google Maps Search Results Scraper allows you to quickly gather POI data with just a few clicks.

To use this scraper for Google Maps POI data, follow these steps:

Sign up for ScrapeHero Cloud
Create a new project
Name the Project
Enter the search queries
Click ‘Gather Data’
Download the data when finished

ScrapeHero Datastore

ScrapeHero Datastore simplifies the process even further by directly providing POI data. You can easily obtain high-quality data by:

Visiting ScrapeHero datastore
Adding the desired data to your cart
Navigating to your cart
Completing the payment process

Why Use ScrapeHero’s Web Scraping Service?

By coding yourself, scraping a few dozen POIs might be manageable, but large-scale scraping with thousands of PoIs across multiple locations becomes more complex. This is where ScrapeHero’s fully managed web scraping service comes into play.

ScrapeHero offers a comprehensive service that handles the entire scraping process for you.

We provide custom solutions to handle dynamic websites like Google Maps for large-scale projects. You can forget about managing proxies, CAPTCHAs, or any other complexities associated with scraping protected sites.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data

Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help