web scraping

6 min read

Web Scraping Hotel Prices from Hotels.com

ScrapeHero
Last Updated: December 10, 2024

Set Up The Environment
Data Scraped from Hotels.com
Scrape Hotel Prices: The Code
Code Limitation
Wrapping Up

Hotels.com allows you to find hotels, their prices, addresses, and other details. However, to analyze hotel prices, you need a large data set, which is impractical to obtain manually. Therefore, a more practical approach is web scraping hotel prices.

In this tutorial, you can learn web scraping with Selenium Python, a browser automation library.

Set Up The Environment

You will use Python to run Selenium in this tutorial, although it is also available in other languages. Selenium can control browsers using its web driver module, making it great for scraping dynamic websites.

You can install Selenium with pip. Besides selenium, this tutorial also uses Pandas, which allows you to manipulate structured data.

You will use Pandas to write the extracted data into a CSV file. Python pip can also install Pandas.

pip install pandas selenium

Data Scraped from Hotels.com

This code will scrape four details from each hotel listing:

the name,
the price,
the rating,
the address

To locate these via Selenium, you must analyze the website structure and figure out the XPath or CSS selectors. You may also use only classes and IDs to select the element, but they may not always work.

Scrape Hotel Prices: The Code

You will use three Python libraries: selenium, Pandas, and time. You must import these.

The time module instructs the program to wait for some time before moving on to the next step. Otherwise, the next step may fail if it depends on the current step’s data.

Note: This code imports Selenium modules separately for convenience.

from time import sleep
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import pandas

Instead of directly writing a script, you will write a function parse(). Then, you will call parse().

In the parse() function, you will start the Chrome browser and go to the target url, hotels.com.

response = webdriver.Chrome()
response.get(url)

Hotels.com has an interactive button that you must click to enter the location. You can find this button with its XPath using the find_element() method.

Then, you can use the click() method to click on it, which will open a search box.

location = response.find_element(By.XPATH,'//button[contains(@aria-label,"to")]')
location.click()

Next, you must find the search box and send the location with the send_keys() method. The send_keys() method can also send keys like return; here, you will use return after filling in the location to submit it.

searchKeyElement = response.find_element(By.XPATH,'//input[contains(@id,"destination")]')
 searchKeyElement.send_keys(searchKey)
 searchKeyElement.send_keys(Keys.RETURN)

Now, you must press the search button. As you did above, locate the element using the XPath and use click().

submitButton = response.find_element(By.XPATH,'//button[@type="submit"]')
submitButton.click()

In this code, you will sort the list from low to high price, which is the second option. Therefore, search the dropdown element and use the down key using send_keys().

dropDownButton = response.find_element(By.XPATH,'//select[contains(@id,"sort-filter-dropdown-sort")]')
dropDownButton.send_keys(Keys.DOWN)

After you change the sort option, the website will take some time to sort. You must wait this long before the next step; this is where you use the sleep() method from the time module.

sleep(5)

The above code makes the program wait 5 seconds before moving to the next line.

You can now search all the listings using the find_all method.

hotels = response.find_elements(By.XPATH,'//a[@class="uitk-card-link"]')

Then use a loop to extract data from each listing:

1. Click on it

for hotel in hotels[:10]:
  hotel.click()

2. Switch the webdriver’s focus to the new window

new_window = response.window_handles[1]
response.switch_to.window(new_window)

3. Find and extract the elements (name, price, address, and rating)

hotelName = response.find_element(By.TAG_NAME,'h1').text
        price = response.find_element(By.XPATH,"//div[@data-stid='price-summary']//span/div").text
        rating = response.find_element(By.CLASS_NAME,"uitk-badge-base-text").text
        address = response.find_element(By.XPATH,'//div[@class="uitk-text uitk-type-300 uitk-text-default-theme uitk-layout-flex-item uitk-layout-flex-item-flex-basis-full_width"]').text

4. append it to a file variable

file.append(item)

5. Close the new window and switch back to the listings page

response.close()
response.switch_to.window(response.window_handles[0])

6. Again, get all the hotel listings

hotels = response.find_elements(By.XPATH,'//a[@class="uitk-card-link"]')

Finally, the function returns the file variable containing all the extracted data.

Then, you use Pandas to write the data to a CSV file.

file = parse('http://www.hotels.com')
df = pandas.DataFrame.from_dict(file)
df.to_csv("hotels.csv")

Here are the results of the data extraction.

Here is the full code to scrape hotel prices and other details from Hotels.com.

#!/usr/bin/env python


from time import sleep
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import pandas


def parse(url):
    searchKey = "Las Vegas" # Change this to your city
    response = webdriver.Chrome()
    response.get(url)
    location = response.find_element(By.XPATH,'//button[contains(@aria-label,"Going to")]')
    location.click()
    searchKeyElement = response.find_element(By.XPATH,'//input[contains(@id,"destination")]')
    searchKeyElement.send_keys(searchKey)
    searchKeyElement.send_keys(Keys.RETURN)
    submitButton = response.find_element(By.XPATH,'//button[@type="submit"]')
    submitButton.click()
    dropDownButton = response.find_element(By.XPATH,'//select[contains(@id,"sort-filter-dropdown-sort")]')
    dropDownButton.send_keys(Keys.DOWN)
    sleep(5)
    hotels = response.find_elements(By.XPATH,'//a[@class="uitk-card-link"]')
    file = []
    print(len(hotels))
    for hotel in hotels[:10]:
        hotel.click()
        sleep(10)
        new_window = response.window_handles[1]
        response.switch_to.window(new_window)
        sleep(3)
        hotelName = response.find_element(By.TAG_NAME,'h1').text
        price = response.find_element(By.XPATH,"//div[@data-stid='price-summary']//span/div").text
        rating = response.find_element(By.CLASS_NAME,"uitk-badge-base-text").text
        address = response.find_element(By.XPATH,'//div[@class="uitk-text uitk-type-300 uitk-text-default-theme uitk-layout-flex-item uitk-layout-flex-item-flex-basis-full_width"]').text
       
        item = {
                    "hotelName":hotelName,
                    "price":price,
                    "rating":rating,
                    "address":address        
        }


        file.append(item)
        response.close()
        response.switch_to.window(response.window_handles[0])
        hotels = response.find_elements(By.XPATH,'//a[@class="uitk-card-link"]')
    return file
if __name__ == '__main__':
    file = parse('http://www.hotels.com')
    df = pandas.DataFrame.from_dict(file)
    df.to_csv("hotels.csv")

Code Limitation

The primary limitation of web scraping with Selenium is that you need to find new XPaths whenever Hotels.com changes its structure. You have to reanalyze the HTML code and figure out the new XPaths.

The code is also unsuitable for large-scale data extraction, where you must consider potential anti-scraping measures.

Also Read: How to Scrape Google Hotels Using Python

Wrapping Up

You can scrape hotel data from Hotels.com using Selenium. Selenium has modules that allow you to visit the website and extract data using various methods, including XPaths and CSS selectors. However, you must remain updated with the website’s structure.

Moreover, enterprise-grade web scraping requires expensive browser farms that can run several simultaneous Selenium browser contexts and extract data. This requires a massive investment.

You can forget about all this if you choose ScrapeHero. We will handle all the backend tasks, including providing robust code and managing anti-scraping measures.

ScrapeHero Services include large-scale web scraping, product monitoring, and many more. We can get you high-quality data, including about airlines and hotels. Moreover, we can make custom web scrapers according to your specifications.

You can also check out the hotel lists in our data store; we may already have your required data.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data

Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help