Web Scraping Stocktwits Data

Share:

Web Scraping Stocktwits Data

StockTwits is a website that hosts share-market data from where you can pull stock data. The website is dynamic.

Therefore, you need headless browsers like Selenium for scraping StockTwits data from their official website.

However, you can also extract data from StockTwits using the URLs from which the website fetches the stock data. These URLs deliver data in a JSON format, and you can use Python requests to fetch them.

This tutorial illustrates StockTwits data scraping using Python requests and Selenium.

Data Scraped from StockTwits

The tutorial teaches you how to get data from StockTwits using Python. It scrapes three kinds of stock information.

  • Top Gainers: Stocks whose value has increased that day
    Top gainers URL on developers options required for scraping StockTwits data
  • Top Losers: Stocks whose value has decreased that day
    Top losers URL on developer options
  • Trending Stocks: Stocks traded most that day
    URL of trending stocks on developer options
  • Earnings Reported: Companies that have reported their earnings that day
    StockTwits list of companies that reported earnings

This code scraps the first three list directly from the URL that delivers this information to StockTwits. To find these URL:

  • Visit StockTwits
  • Open developer options
  • Go to the network tab
    Network tab in developer options
  • Make sure you have selected Fetch/XHR
  • Scroll in the name panel to find the URL
    Fetch/XHR tab and Name panel in developer options

The Environment

This tutorial uses Selenium to get the HTML code from the StockTwits website and Python requests to extract data from the URLs that deliver stock data.

Therefore, install Python requests and Selenium using pip.

pip install requests selenium

The tutorial uses BeautifulSoup to parse data. It is also an external library you can install using pip.

pip install beautifulsoup4

The code also requires the json module to save the extracted data to a JSON file; however, the standard Python library includes the json module.

Scraping StockTwits Data: The Code

The code logic for scraping StockTwits data

The code for Scraping StockTwits data begins with the import statements. Import the json module, BeautifulSoup, the Selenium By module, and the Selenium webdriver module. There will be a total of four import statements.

The code uses two functions: earnings() and extract():

  • The first function, earnings(), extracts details of the companies that reported their earnings that day.
  • The second function, extract(), is the code’s entry point. It asks the user what to scrape and executes the code accordingly.

earnings()

The earnings function adds the headless option to the Selenium webdriver object; this makes scraping faster by removing GUI.

options = webdriver.ChromeOptions()
options.add_argument("--headless")

Then, it launches Chrome using the options as an arguement and goes to the https://stocktwits.com/markets/calendar page using the get method.

chrome = webdriver.Chrome(options=options)
page = chrome.get("https://stocktwits.com/markets/calendar")

You can now get the HTML code of the page using the page_source attribute.

htmlContent = chrome.page_source

Then, parse the page source using BeautifulSoup. Pass the page source to the BeautifulSoup constructor.

soup = BeautifulSoup(htmlContent)

Locate all the rows inside a div element with the role row.

earnings = soup.find_all("div",{"role":"row"})

You can then iterate through all the rows and extract each detail.

companyEarnings = []


for earning in earnings[1:]:
    company = earning.find_all("p")
    symbol = company[0].text
    name = company[1].text
    price = earning.find("div",{"class":"EarningsTable_priceCell__Sxx1_"}).text

 Note: The loop begins with the second member of the earnings array as the first is the header row.

The code extracts symbols, the company name, and the stock price. You can find the symbols and the name of the company in the p tags and the price in the div tag with the class EarningsTable_priceCell__Sxx1_.

In each loop, the function appends the extracted data to an array.

    companyEarnings.append(
        {
        "Symbol":symbol,
        "Company Name": name,
        "Price": price
    }
    )

Finally, the function saves the array as a JSON file.

with open("earnings.json","w") as earningsFile:
    json.dump(companyEarnings,earningsFile,indent=4,ensure_ascii=False)

extract()

The extract() function acts as the entry point of the code. It asks you to select what to scrape and gives five choices.

The first four choices ask you what to scrape:

  • top gainers
  • top losers
  • trending stocks
  • stocks that reported their earnings that day
query = input("What to scrape? \n top-gainers [1]\n top-losers[2]\n trending stocks[3]\n stocks which reported earnings today[4]\n Insert a number (1,2,3, or 4)\n To cancel, enter 0.")

When you select the option four, the function calls earnings(). The fifth choice exits the program.

if query == "4":
        earnings()
    elif query == "0":
        return

For the other three choices, the function uses Python requests to scrape top gainers, top losers, and trending stocks. It

  • Sets the URL corresponding to the choice
  • Gets the data from that URL through Python requests
  • Saves the data to a JSON file

For the above process, extract() uses a switch().

else:
    match int(query):


        case 1:
            url = "https://api.stocktwits.com/api/2/symbols/stats/top_gainers.json?regions=US"
            name = "topGainers"


        case 2:
            url = "https://api.stocktwits.com/api/2/symbols/stats/top_losers.json?regions=US"
            name = "topLosers"


        case 3:
            url = "https://api-gw-prd.stocktwits.com/rankings/api/v1/rankings?identifier=US&identifier-type=exchange-set&limit=15&page-num=1&type=ts"
            name = "trending"


    response = requests.get(url,headers=headers)
    responseJson = json.loads(response.text)


    with open(f"{name}.json","w") as jsonFile:
        json.dump(responseJson,jsonFile,indent=4)

Finally, the function asks whether they need to restart the process afterward. If you choose yes, the extract() function gets called again—otherwise, the program exits.

more = input("Do you want to continue")


    if more.lower() == "yes":
        extract()
    else:
        return

Want to scrape financial data from Yahoo? Check this tutorial on how to scrape Yahoo Finance.

Code Limitations

Even though the code is suitable for scraping StockTwits data, you might need to alter it later. This is because StockTwits may change the site’s structure.

Or it might change the URL from which it fetches the stock data. In either case, the code may fail to execute.

Moreover, the code may not work for large-scale data extraction, as it doesn’t consider anti-scraping measures.

Wrapping Up

Using the code in this tutorial, you can scrape stock Market data from StockTwits with Python requests and Selenium.

However, you might need to change the code whenever StockTwits changes its website structure or the URL that delivers the stock data.

But you don’t have to change the code yourself; ScrapeHero can help you. We can take care of all your web scraping needs.

ScrapeHero is a full-service web scraping service provider capable of building enterprise-grade web scrapers and crawlers according to your specifications. ScrapeHero services include large-scale scraping and crawling, monitoring, and custom robotic process automation.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Table of content

Scrape any website, any format, no sweat.

ScrapeHero is the real deal for enterprise-grade scraping.

Ready to turn the internet into meaningful and usable data?

Contact us to schedule a brief, introductory call with our experts and learn how we can assist your needs.

Continue Reading

Transform and map scraped data

How to Transform and Map Scraped Data with Python Libraries

Learn how you can transform and map data using Python.
Using NLP to clean and structure scraped data

How to Use NLP to Clean and Structure Scraped Data

Learn how to use NLP to clean and structure scraped data.
Search engine web crawling

From Crawling to Ranking! This is How Search Engines Use Web Crawling to Index Websites!

Search engine crawling indexes web pages, making it essential for ranking and visibility in search results.
ScrapeHero Logo

Can we help you get some data?