How to Scrape Fandango using Python and LXML

Share:

how-to-scrape-movie-details-from-fandango-tutorial-using-python-lxml

Web scraping is an efficient method for extracting data about movies, timings, seating etc from movie sites.
Imagine all the movie data that you can gather on a daily basis. You could scrape the data for a particular actor, director or genre and use the information to analyze ongoing movie trends.
This tutorial is about scraping movie details from Fandango.com, a movie booking site, which allows you to find movie overviews and current showtimes.

In this web scraping tutorial, we’ll scrape Fandango.com for the movie details based on a given location and date.

Here is a list of fields we will be extracting:

  1. Theater Name
  2. Theater Address
  3. Movie Name
  4. Show Date
  5. Zip Code/Location
  6. Duration
  7. Genre
  8. Star Rating (Out of 5)
  9. Movie Rating

Below is a screenshot of some of the data that will be scraped.

movie-fields-to-extract-from-fandango

 Scraping Logic

  1. Construct the URL of the search results from Fandango- Here is the one for the zip code 20001- https://www.fandango.com/20001_movietimes?mode=general&q=20001
  2. Download HTML of the search result page using Python Requests.
  3. Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths. We have predefined the XPaths for the details we need in the code.
  4. Save the data to a CSV file. In this article we are only scraping the movie name, rating, genre, theater address and name from the first page of results, so a CSV file should be enough to fit in all the data. If you would like to extract details in bulk, a JSON file is more preferable. You can read about choosing your data format, just to be sure.

Requirements

For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements.

Install Python 3 and Pip

Here is a guide to install Python 3 in Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/

Mac Users can follow this guide – http://docs.python-guide.org/en/latest/starting/install3/osx/

Windows Users go here – https://www.scrapehero.com/how-to-install-python3-in-windows-10/

Install Packages

Don’t want to code? ScrapeHero Cloud is exactly what you need.

With ScrapeHero Cloud, you can download data in just two clicks!

The Code

https://gist.github.com/scrapehero/edc9d9dffd24402a9c176862d076db18

If the embed above doesn’t work, you can download the code from the link here.

If you would like the code in Python 2.7, check out this link. 

Running the Scraper

Assume the script is named fandango.py. If you type in the script name in command prompt or terminal along with a -h

usage: fandango.py [-h] location showdate

positional arguments:
location movie location (zipcode or city+state)
showdate movie show time

optional arguments:
-h, --help show this help message and exit

The arguments location and showtime are the keywords to find the list of movies for a given location and date.

The argument for location can be given by using a zip code, or you can provide it in the format ‘City, State Abbreviation’. The argument showdate should be given in the format YYYY/MM/DD.

python3 fandango.py "Queen City, CA" "2017-12-29"

This will create a CSV  file called Queens, CA-2017-12-29-movie-results.csv that will be in the same folder as the script. Here is some sample data extracted from Fandango.com for the command above. You can follow this tutorial if you would like to parse the address into a structured format.

extracted-details-fandango-scrape-tutorial

 

Known Limitations

This scraper should be able to scrape the details of movies currently showing on Fandango.com. You can even go further and create a complex scraper to collect the details of the available seats for each showtime. If you would like to scrape the details of thousands of pages at very short intervals then you should read  Scalable do-it-yourself scraping – How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

 

 

Table of content

Scrape any website, any format, no sweat.

ScrapeHero is the real deal for enterprise-grade scraping.

Ready to turn the internet into meaningful and usable data?

Contact us to schedule a brief, introductory call with our experts and learn how we can assist your needs.

Continue Reading

Transform and map scraped data

How to Transform and Map Scraped Data with Python Libraries

Learn how you can transform and map data using Python.
Using NLP to clean and structure scraped data

How to Use NLP to Clean and Structure Scraped Data

Learn how to use NLP to clean and structure scraped data.
Search engine web crawling

From Crawling to Ranking! This is How Search Engines Use Web Crawling to Index Websites!

Search engine crawling indexes web pages, making it essential for ranking and visibility in search results.
ScrapeHero Logo

Can we help you get some data?