web scraping

5 min read

Web Scraping Stocktwits Data

Matthew
Last Updated: December 11, 2024

Data Scraped from StockTwits
The Environment
Scraping StockTwits Data: The Code
Code Limitations
Wrapping Up

StockTwits is a website that hosts share-market data from where you can pull stock data. The website is dynamic.

Therefore, you need headless browsers like Selenium for scraping StockTwits data from their official website.

However, you can also extract data from StockTwits using the URLs from which the website fetches the stock data. These URLs deliver data in a JSON format, and you can use Python requests to fetch them.

This tutorial illustrates StockTwits data scraping using Python requests and Selenium.

Data Scraped from StockTwits

The tutorial teaches you how to get data from StockTwits using Python. It scrapes three kinds of stock information.

Top Gainers: Stocks whose value has increased that day
Top Losers: Stocks whose value has decreased that day
Trending Stocks: Stocks traded most that day
Earnings Reported: Companies that have reported their earnings that day

This code scraps the first three list directly from the URL that delivers this information to StockTwits. To find these URL:

Visit StockTwits
Open developer options
Go to the network tab

Make sure you have selected Fetch/XHR
Scroll in the name panel to find the URL

The Environment

This tutorial uses Selenium to get the HTML code from the StockTwits website and Python requests to extract data from the URLs that deliver stock data.

Therefore, install Python requests and Selenium using pip.

pip install requests selenium

The tutorial uses BeautifulSoup to parse data. It is also an external library you can install using pip.

pip install beautifulsoup4

The code also requires the json module to save the extracted data to a JSON file; however, the standard Python library includes the json module.

Scraping StockTwits Data: The Code

The code for Scraping StockTwits data begins with the import statements. Import the json module, BeautifulSoup, the Selenium By module, and the Selenium webdriver module. There will be a total of four import statements.

The code uses two functions: earnings() and extract():

The first function, earnings(), extracts details of the companies that reported their earnings that day.
The second function, extract(), is the code’s entry point. It asks the user what to scrape and executes the code accordingly.

earnings()

The earnings function adds the headless option to the Selenium webdriver object; this makes scraping faster by removing GUI.

options = webdriver.ChromeOptions()
options.add_argument("--headless")

Then, it launches Chrome using the options as an arguement and goes to the https://stocktwits.com/markets/calendar page using the get method.

chrome = webdriver.Chrome(options=options)
page = chrome.get("https://stocktwits.com/markets/calendar")

You can now get the HTML code of the page using the page_source attribute.

htmlContent = chrome.page_source

Then, parse the page source using BeautifulSoup. Pass the page source to the BeautifulSoup constructor.

soup = BeautifulSoup(htmlContent)

Locate all the rows inside a div element with the role row.

earnings = soup.find_all("div",{"role":"row"})

You can then iterate through all the rows and extract each detail.

companyEarnings = []


for earning in earnings[1:]:
    company = earning.find_all("p")
    symbol = company[0].text
    name = company[1].text
    price = earning.find("div",{"class":"EarningsTable_priceCell__Sxx1_"}).text

Note: The loop begins with the second member of the earnings array as the first is the header row.

The code extracts symbols, the company name, and the stock price. You can find the symbols and the name of the company in the p tags and the price in the div tag with the class EarningsTable_priceCell__Sxx1_.

In each loop, the function appends the extracted data to an array.

    companyEarnings.append(
        {
        "Symbol":symbol,
        "Company Name": name,
        "Price": price
    }
    )

Finally, the function saves the array as a JSON file.

with open("earnings.json","w") as earningsFile:
    json.dump(companyEarnings,earningsFile,indent=4,ensure_ascii=False)

extract()

The extract() function acts as the entry point of the code. It asks you to select what to scrape and gives five choices.

The first four choices ask you what to scrape:

top gainers
top losers
trending stocks
stocks that reported their earnings that day

query = input("What to scrape? \n top-gainers [1]\n top-losers[2]\n trending stocks[3]\n stocks which reported earnings today[4]\n Insert a number (1,2,3, or 4)\n To cancel, enter 0.")

When you select the option four, the function calls earnings(). The fifth choice exits the program.

if query == "4":
        earnings()
    elif query == "0":
        return

For the other three choices, the function uses Python requests to scrape top gainers, top losers, and trending stocks. It

Sets the URL corresponding to the choice
Gets the data from that URL through Python requests
Saves the data to a JSON file

For the above process, extract() uses a switch().

else:
    match int(query):


        case 1:
            url = "https://api.stocktwits.com/api/2/symbols/stats/top_gainers.json?regions=US"
            name = "topGainers"


        case 2:
            url = "https://api.stocktwits.com/api/2/symbols/stats/top_losers.json?regions=US"
            name = "topLosers"


        case 3:
            url = "https://api-gw-prd.stocktwits.com/rankings/api/v1/rankings?identifier=US&amp;identifier-type=exchange-set&amp;limit=15&amp;page-num=1&amp;type=ts"
            name = "trending"


    response = requests.get(url,headers=headers)
    responseJson = json.loads(response.text)


    with open(f"{name}.json","w") as jsonFile:
        json.dump(responseJson,jsonFile,indent=4)

Finally, the function asks whether they need to restart the process afterward. If you choose yes, the extract() function gets called again—otherwise, the program exits.

more = input("Do you want to continue")


    if more.lower() == "yes":
        extract()
    else:
        return

Want to scrape financial data from Yahoo? Check this tutorial on how to scrape Yahoo Finance.

Code Limitations

Even though the code is suitable for scraping StockTwits data, you might need to alter it later. This is because StockTwits may change the site’s structure.

Or it might change the URL from which it fetches the stock data. In either case, the code may fail to execute.

Moreover, the code may not work for large-scale data extraction, as it doesn’t consider anti-scraping measures.

Wrapping Up

Using the code in this tutorial, you can scrape stock Market data from StockTwits with Python requests and Selenium.

However, you might need to change the code whenever StockTwits changes its website structure or the URL that delivers the stock data.

But you don’t have to change the code yourself; ScrapeHero can help you. We can take care of all your web scraping needs.

ScrapeHero is a full-service web scraping service provider capable of building enterprise-grade web scrapers and crawlers according to your specifications. ScrapeHero services include large-scale scraping and crawling, monitoring, and custom robotic process automation.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data

Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help