Web Scraping Tutorials


LEARN HOW TO USE WEB SCRAPING TO ENHANCE PRODUCTIVITY AND AUTOMATION

We provide many step-by-step tutorials with source code for web scraping, web crawling, data extraction, headless browsers, etc.

Our web scraping tutorials are usually written in Python using libraries such as LXML, Beautiful Soup, Selectorlib and occasionally in Node.js.

The full source code is also available to download in most cases or available to be easily cloned using Git.

We also provide various in-depth articles about Web Scraping tips, techniques and the latest technologies which include the latest anti-bot technologies, methods used to safely and responsibly gather publicly available data from the Internet.

The community that has coalesced around these tutorials and their comments help anyone from a beginner hobbyist person to an advanced programmer solve some of the issues they face with web scraping.

These tutorials are frequently linked to as StackOverflow solutions and discussed on Reddit.

Please feel free to read and participate in the discussions with your comments.

All Tutorials

How to scrape Yelp Business Details using Python and LXML

How to scrape Yelp Business Details using Python and LXML

This tutorial is a follow-up of How to scrape Yelp.com for Business Listings using Python. In this tutorial, we will help you in scraping Yelp.com data from the detail page of a business. You can use URLs of businesses you are interested in OR the ones you got from part one of this tutorial. Let’s […]

The best data and file formats for scraped data

The best data and file formats for scraped data

The data we provide comes in various forms from the source and is largely text (barring rich media such as images and videos or proprietary file formats such as PDFs). Our customers need this data in various formats and the key to a successful and scalable solution that fits the best data formats for web […]

How to scrape TripAdvisor for Hotel Data, Pricing and Reviews using Python

How to scrape TripAdvisor for Hotel Data, Pricing and Reviews using Python

Step by step tutorial to scrape Tripadvisor reviews and hotel data – Name, Price Per Night, Deals Reviews, and Ratings using Python and LXML.

An API for every site using web scraping

An API for every site using web scraping

There is a lot of content available on the millions of websites on the Internet, and all of them involve some amount of programming to get them there, however, to get to all this content using a programmatic API isn’t really possible. If you need data scraped from a website in a specific format in […]

XPaths and their relevance in Web Scraping

XPaths and their relevance in Web Scraping

XPath (XML Path Language) is a syntax for defining parts of an XML document. We will explain the relevance of Xpath in web scraping. XPath is a query language for identifying and selecting nodes or elements in an XML document using a tree like representation of the document. XPath was defined by the World Wide […]

Why *not* scrape yourself

Why *not* scrape yourself

Before you get all kinds of ideas about what the topic of this article means – please look at the context – We are talking about Web Scraping here ! This post will talk about reason why not to do this yourself and why to call in a professional (wink wink – use ScrapeHero) You […]

Turn the Internet into meaningful, structured and usable data