A simple Python app that can give you valuable insights into just about any topic you can think of
With about 6,000 tweets being sent out per second and 500 million posted per day, the average person could not even imagine trying to parse out all this data. Unless that average person is reading this article: then you might venture to try. Now we won’t be checking every tweet, but rather tweets of a bespoke nature. We can extract tweets that mention a specific keyword or phrase and with that information measure how the ‘twitter-verse’ feels about said topic or phrase.
This tutorial will be very noob-friendly, in fact, it won’t require any prior coding experience to run and install. You just need a laptop and Python 3.8 — I’ll explain the rest. If you’ve absolutely never touched Python, the most difficult part will be setting up your IDE with Python 3.8 and installing the required dependencies. This project uses much of the same code as my previous article, however while that project was primarily focused on presenting and testing an idea, the final product of this project is an executable .py application.
Again, we’ll be using Python 3.8 for this project so if you’re using a previous version, you’ll need to set up a new virtual environment with 3.8 or update your current environment. Check your Python version with:
from platform import python_version print(python_version())
To start out, as with all coding projects, let us install some dependencies:
import pandas as pd import numpy as np import csv import snscrape.modules.twitter as sntwitter from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer import datetime as dt import time
If you receive an error installing any of these packages, just pip3 install them (for example pip3 install snscrape) or google how to install them.
We initialize our date-time objects so that we pull data starting from the beginning of the previous day. This means that we aren’t specifically extracting tweets from the last 24 hours, rather we are extracting tweets from the start of the previous day and all tweets posted thus far today.
# Generating datetime objects from datetime import datetime, timedelta now = datetime.now() now = now.strftime('%Y-%m-%d') yesterday = datetime.now() - timedelta(days = 1) yesterday = yesterday.strftime('%Y-%m-%d')
Prompting the user to input a keyword:
keyword = input('Enter a topic or keyword, please:')
Scraping twitter and writing a unique CSV file titled with the keyword we chose in our user input and the date we run the code.
maxTweets = 80000 #Open/create a file to append data to csvFile = open(keyword +'-sentiment-' + now + '.csv', 'a', newline='', encoding='utf8') #Use csv writer csvWriter = csv.writer(csvFile) csvWriter.writerow(['id','date','tweet',]) for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + ' lang:en since:' + yesterday + ' until:' + now + ' -filter:links -filter:replies').get_items()): if i > maxTweets : break csvWriter.writerow([tweet.id, tweet.date, tweet.content]) csvFile.close()
analyzer = SentimentIntensityAnalyzer()
From here we read the CSV file back into our program, create columns containing the various sentiment scores of each individual extracted tweet, take the averages of each sentiment score for all tweets posted since the start of the previous day, and finally print out all this information.
# Reading CSV file back into our program df = pd.read_csv('/Users/Graham/data-works/sentiment-bot/'+ keyword +'-sentiment-' + now + '.csv', parse_dates=True, index_col=0) # Creating sentiment scores columns df['compound'] = [analyzer.polarity_scores(x)['compound'] for x in df['tweet']] df['neg'] = [analyzer.polarity_scores(x)['neg'] for x in df['tweet']] df['neu'] = [analyzer.polarity_scores(x)['neu'] for x in df['tweet']] df['pos'] = [analyzer.polarity_scores(x)['pos'] for x in df['tweet']] # Taking averages of sentiment score columns avg_compound = np.average(df['compound']) avg_neg = np.average(df['neg']) * -1 # Change neg value to negative number for clarity avg_neu = np.average(df['neu']) avg_pos = np.average(df['pos']) # Counting number of tweets count = len(df.index) # Print Statements print("Since yesterday there has been", count , "tweets on " + keyword, end='\n*') print("Positive Sentiment:", '%.2f' % avg_pos, end='\n*') print("Neutral Sentiment:", '%.2f' % avg_neu, end='\n*') print("Negative Sentiment:", '%.2f' % avg_neg, end='\n*') print("Compound Sentiment:", '%.2f' % avg_compound, end='\n')
I also included a timer in my code that tells me how long it took to run the full code. You can find and download the full code on my Github:
Running Our App
To run the code simply open your command terminal then navigate to the folder containing your app (in my case it’s in a folder within my data-works folder corresponding to my Github repo):
Then just call the function with the following code (NOTE: make sure your terminal is using your Python 3.8 virtual environment).
Below is a screenshot of what my terminal looks like after successfully running the program we just wrote:
In 5 minutes 17 seconds, we scraped a little under 36,000 tweets including keyword “Biden”, showing overall neutral sentiment with a slightly positive skew.
Getting Even More out of our Data:
While this information is interesting and valuable, we could push our analysis even further to give more actionable insights. For example, if you were launching a new product that competed with an existing product, you could scrape twitter to find the negative feedback that product was getting online, then adjust your new product to remedy these complaints. There are tons of ways of working with user-generated data and a twitter scraping program is a great jumping off point.
sagefuentes. 17 Sep 2020. “HTTP Error, Gives 404 but the URL is working #98.” https://github.com/Mottl/GetOldTweets3/issues/98.
Hilpish, Yves. “Python for Finance: Mastering Data Driven Finance”. O’Reilly 2014.