In this tutorial I will explain how to build a Reddit crypto currency sentiment indicator in Python. Sentiment analysis is the process of statistically determining if a piece of text is positive, negative or neutral. This program will scan Reddit for different crypto currencies and rank the keywords that are used in comments. Then it will determine if the sentiment is positive or negative.
The process will:
- Fetch the latest crypto tickers from CoinGecko
- Search Reddit for these crypto tickers
- VADER (a sentiment analysis) will check keywords in comments to determine if they are in a keyword lexicon (dictionary)
- Keywords identified in the lexicon will be graded as positive or negative
- Positive words have higher positive ratings and more negative words have lower negative ratings.
Reddit Sentiment Process
- Configure Python code
- Run code to generate sentiment analysis
- Review results
Python code to create sentiment analysis
First create a Python file and name it sentiment_reddit_template.py. Copy the code below into this file. This template file creates the configuration file needed for sentiment analysis. In this file modify the black list and new words list.
- blacklist = This is an exclusion list of items that you do not want to include in your analysis
- new_words = This list contains the words you want to rank and their sentiment. Positive numbers represent a positive sentiment and negative numbers represent a negative sentiment.
#sentiment_reddit_template.py
# use this template to generate the reddit sentiment config file which is used in the sentiment process
crypto = { {{ tickers }} }
# Exclude common words used on crypto reddit that are also crypto names
blacklist = {'I', 'WSB', 'THE', 'A', 'ROPE', 'YOLO', 'TOS', 'CEO', 'DD', 'IT', 'OPEN', 'ATH', 'PM', 'IRS', 'FOR','DEC', 'BE', 'IMO', 'ALL', 'RH', 'EV', 'TOS', 'CFO', 'CTO', 'DD', 'BTFD', 'WSB', 'OK', 'PDT', 'RH', 'KYS', 'FD', 'TYS', 'US', 'USA', 'IT', 'ATH', 'RIP', 'BMW', 'GDP', 'OTM', 'ATM', 'ITM', 'IMO', 'LOL', 'AM', 'BE', 'PR', 'PRAY', 'PT', 'FBI', 'SEC', 'GOD', 'NOT', 'POS', 'FOMO', 'TL;DR', 'EDIT', 'STILL', 'WTF', 'RAW', 'PM', 'LMAO', 'LMFAO', 'ROFL', 'EZ', 'RED', 'BEZOS', 'TICK', 'IS', 'PM', 'LPT', 'GOAT', 'FL', 'CA', 'IL', 'MACD', 'HQ', 'OP', 'PS', 'AH', 'TL', 'JAN', 'FEB', 'JUL', 'AUG', 'SEP', 'SEPT', 'OCT', 'NOV', 'FDA', 'IV', 'ER', 'IPO', 'MILF', 'BUT', 'SSN', 'FIFA', 'USD', 'CPU', 'AT', 'GG', 'Mar'}
# adding crypto reddit to vader to improve sentiment analysis, score: 4.0 to -4.0. Rank each keyword
# add new key words below that you would like to rank
new_words = {
'lambo': 4.0,
'rekt': -4.0,
'citron': -4.0,
'hidenburg': -4.0,
'moon': 4.0,
'Elon': 2.0,
'hodl': 2.0,
'highs': 2.0,
'mooning': 4.0,
'long': 2.0,
'short': -2.0,
'call': 4.0,
'calls': 4.0,
'put': -4.0,
'puts': -4.0,
'break': 2.0,
'tendie': 2.0,
'tendies': 2.0,
'town': 2.0,
'overvalued': -3.0,
'undervalued': 3.0,
'buy': 4.0,
'sell': -4.0,
'gone': -1.0,
'gtfo': -1.7,
'fomo': 2.0,
'paper': -1.7,
'bullish': 3.7,
'bearish': -3.7,
'bagholder': -1.7,
'stonk': 1.9,
'green': 1.9,
'money': 1.2,
'print': 2.2,
'rocket': 2.2,
'bull': 2.9,
'bear': -2.9,
'pumping': 1.0,
'sus': -3.0,
'offering': -2.3,
'rip': -4.0,
'downgrade': -3.0,
'upgrade': 3.0,
'maintain': 1.0,
'pump': 1.9,
'hot': 2,
'drop': -2.5,
'rebound': 1.5,
'crack': 2.5, }
Second create a file called sentiment_reddit_ticker_generator.py. Copy the code below into this file. Use the code below to query Coin Gecko to get the latest list of tickers. This file will use the code above to generate the actual configuration file (sentiment_reddit_config.py) needed for the sentiment program. This process allows you to for update the ticker list on demand.
import urllib.request
import json
from jinja2 import Environment, FileSystemLoader
tickers = ''
for i in range(1, 3):
# 250 tickets per page. we are going to request 10 pages from Coingecko so that is 2,500 tickers
endpoint = f'https://api.coingecko.com/api/v3/coins/markets?vs_currency=usd&order=market_cap_desc&per_page=250&page={i}&sparkline=false'
with urllib.request.urlopen(endpoint) as url:
# convert to a json
data = json.loads(url.read().decode())
for crypto in data:
tickers = tickers + "'" + crypto['symbol'].upper() + "'" + ','
file_loader = FileSystemLoader('./')
env = Environment(loader=file_loader)
# dynamically create a file of tickers using the template file so it is in the correct format for the program
template = env.get_template('sentiment_reddit_template.py')
output = template.render(tickers=tickers)
# save the results in sentiment_reddit_config.py
with open("sentiment_reddit_config.py", "w") as fh:
fh.write(output)
The code above will generate the actual configuration file named sentiment_reddit_config.py. This configuration file is used by the sentiment analysis process. It will contain the latest list of crypto tickers from Coin Gecko and sentiment words to rank.
# all crypto tickers fetched from coingecko
# this config file is used in the sentiment process
crypto = { 'BTC','ETH','BNB','XRP','USDT','ADA','DOGE','DOT','UNI','LTC','BCH','LINK','VET','USDC','XLM','SOL','THETA','FIL','TRX','WBTC','BUSD','XMR','LUNA','NEO','KLAY','MIOTA','EOS','AAVE','CAKE','ATOM','BSV','CRO','BTT','OKB','FTT','MATIC','ETC','CUSDC','CETH','XTZ','MKR','ALGO','AVAX','KSM','DAI','RUNE','CDAI','HT','COMP','EGLD','XEM','HOT','DASH','CHZ','DCR','ZEC','SNX','ZIL','HBAR','STX','ENJ','CEL','LEO','SUSHI','AMP','WAVES','NEXO','SC','UST','DGB','GRT','FEI','NEAR','BAT','MANA','YFI','BTG','ARRR','UMA','QTUM','HBTC','RVN','LUSD','ONT','ZRX','ICX','HNT','ONE','ZEN','WRX','AR','FTM','FLOW','BNT','IOST','RSR','OMI','XDC','DENT','NANO','CHSB','PAX','ANKR','WIN','OMG','KCS','VGX','PUNDIX','CRV','XSUSHI','HUSD','CFX','XVG','BCHA','NPXS','1INCH','REN','XVS','LSK','NXM','LPT','VTHO','SNT','OXY','LRC','STETH','BTMX','CKB','RENBTC','BAL','GT','MIR','OCEAN','BOT','CELO','ZKS','CUSDT','TRIBE','BTCST','QNT','RAY','AXS','SRM','EWT','BAND','REEF','GLM','WOO','STMX','NKN','CUNI','KNCL','BCD','MAID','TON','FET','OGN','TLM','DODO','MED','SKL','AMPL','ARDR','KIN','TEL','CELR','ETN','HBC','AGI','NMR','SAND','WAXP','AUDIO','SXP','ALPHA','RFOX','CVC','MDX','ORN','BAKE','STEEM','FUN','KMD','POLY','TUSD','ARK','BTM','BTS','NOIA','ORBS','META','IOTX','ANT','USDN','AKT','SETH','KAVA','KLV','STORJ','RPL','XHV','GNO','GHX','WAN','ANC','VLX','SWAP','ERSDL','BADGER','UOS','FORTH','AVA','UBT','UQC','ALCX','HNS','COTI','NWC','MTL','LEND','SFP','HIVE','TITAN','RDD','HTR','RUNE','MATH','UTK','INJ','VRA','PAID','REP','KAI','BUNNY','KOBE','IQ','ROSE','LINA','MONA','TWT','RIF','KEEP','ELF','DNT','STRAX','SCRT','SUPER','ZMT','OHM','TRAC','GAS','POLS','CZRX','RLC','QKC','SYS','SUSD','TKO','CTSI','CRU','MASK','WNXM','CRE','ATRI','PERP','POWR','TOMO','ERN','XOR','PPT','JST','ROOK','VAI','VRSC','ALICE','EPS','FX','MFT','DAG','AION','PHA','EXRD','EDG','ADX','NU','DIA','RGT','API3','LAMB','PRQ','CBAT','LIT','LYXE','BCN','STETH','AE','GALA','DPI','IRIS','RNDR','SHR','DDX','ELA','PAC','HXRO','MLN','GNY','XAUT','RAMP','LTO','POND','C20','DAO','XCM','CHR','TRB','TT','ERG','AKRO','MAPS','FIRO','AUCTION','ZNN','MX','QUICK','EMC2','NMX','NRG','LOOMOLD','BSCPAD','DIVI','LON','NULS','IGNIS','DSLA','KDA','WOZX','SRK','CTK','ALBT','BEL','BOA','BAR','LBC','USDP','BEAM','MXC','HOGE','VSP','FREE','DUCK','FRAX','SFI','SOLVE','BLZ','COL','SPI','RFR','SERO','RLY','GRS','ID','LOC','ALPACA','PSG','GUSD','BZRX','DATA','SAFEMARS','DRGN','TORN','OXEN','WICC','PAXG','PIVX','AETH','SURE','VITE','HARD','FARM','SLT','GET','REQ','BIFI','TVK','WHALE','YFII','PCX','OM','MRPH','AERGO','HYDRA','COS','DUSK','VSYS','NXS','STAKE','DERO','VXV','MHC','VTC','ARPA','YLD','SWTH','CHAIN','PNK','STPT','BFC','XPRT','APL','HC','YCC','RAI','PIB','TRU','LGO','ESD','NEST','CREAM','FRONT','PHB','SBTC','WOW','ARMOR','VAL','SUKU','RAD','VETH','FIDA','NIM','NRV','FEG','DEGO','LAYER','BOSON','FIO','BELT','IDEX','VISR','SPARTA','SWINGBY','PNT','NBR','ZERO','COPE','MITH','ZAI','FRM','CFI','PROM','FSN','DF','HELMET','DEXT','MTA','BAO','CND','AST','BMI','FXF','ECO','HAI','HEGIC','DG','CARDS','LQTY','KP3R','WING','RDN','AIOZ','RARI','DMT','TBTC','KYL','AUTO','MBL','DCN','BONDLY','FIS','BIP','NFTX','SKY','BDPI','FXS','NXT','BOR','GXC','UFT','RING','DOCK','CORE','INDEX','VID','DEXE','CONV','RCN','SBD','UNFI','GBYTE','ZEE', }
# Excludes common words and words used on crypto reddit that are also crypto names
blacklist = {'I', 'WSB', 'THE', 'A', 'ROPE', 'YOLO', 'TOS', 'CEO', 'DD', 'IT', 'OPEN', 'ATH', 'PM', 'IRS', 'FOR','DEC', 'BE', 'IMO', 'ALL', 'RH', 'EV', 'TOS', 'CFO', 'CTO', 'DD', 'BTFD', 'WSB', 'OK', 'PDT', 'RH', 'KYS', 'FD', 'TYS', 'US', 'USA', 'IT', 'ATH', 'RIP', 'BMW', 'GDP', 'OTM', 'ATM', 'ITM', 'IMO', 'LOL', 'AM', 'BE', 'PR', 'PRAY', 'PT', 'FBI', 'SEC', 'GOD', 'NOT', 'POS', 'FOMO', 'TL;DR', 'EDIT', 'STILL', 'WTF', 'RAW', 'PM', 'LMAO', 'LMFAO', 'ROFL', 'EZ', 'RED', 'BEZOS', 'TICK', 'IS', 'PM', 'LPT', 'GOAT', 'FL', 'CA', 'IL', 'MACD', 'HQ', 'OP', 'PS', 'AH', 'TL', 'JAN', 'FEB', 'JUL', 'AUG', 'SEP', 'SEPT', 'OCT', 'NOV', 'FDA', 'IV', 'ER', 'IPO', 'MILF', 'BUT', 'SSN', 'FIFA', 'USD', 'CPU', 'AT', 'GG', 'Mar'}
# adding crypto reddit to vader to improve sentiment analysis, score: 4.0 to -4.0. Rank each keyword
new_words = {
'lambo': 4.0,
'rekt': -4.0,
'citron': -4.0,
'hidenburg': -4.0,
'moon': 4.0,
'Elon': 2.0,
'hodl': 2.0,
'highs': 2.0,
'mooning': 4.0,
'long': 2.0,
'short': -2.0,
'call': 4.0,
'calls': 4.0,
'put': -4.0,
'puts': -4.0,
'break': 2.0,
'tendie': 2.0,
'tendies': 2.0,
'town': 2.0,
'overvalued': -3.0,
'undervalued': 3.0,
'buy': 4.0,
'sell': -4.0,
'gone': -1.0,
'gtfo': -1.7,
'fomo': 2.0,
'paper': -1.7,
'bullish': 3.7,
'bearish': -3.7,
'bagholder': -1.7,
'stonk': 1.9,
'green': 1.9,
'money': 1.2,
'print': 2.2,
'rocket': 2.2,
'bull': 2.9,
'bear': -2.9,
'pumping': 1.0,
'sus': -3.0,
'offering': -2.3,
'rip': -4.0,
'downgrade': -3.0,
'upgrade': 3.0,
'maintain': 1.0,
'pump': 1.9,
'hot': 2,
'drop': -2.5,
'rebound': 1.5,
'crack': 2.5, }
Finally, create a file called sentiment_reddit.py. Copy the code below into this file. This file will scan Reddit and use the configuration file above to generate sentiment data. Log into your Reddit account and retrieve your client id and client secret. This will enable the program to log into reddit programmatically and scan subReddits for information.
- client_id=”xxxxxxxxxxxxxxxxxxxxxxx”,
- client_secret=”xxxxxxxxxxxxxxxxxxxxxxx”
import praw
import time
import pandas as pd
import logging
import threading
import matplotlib.pyplot as plt
import squarify
from sentiment_reddit_config import *
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk
nltk.downloader.download('vader_lexicon')
'''*****************************************************************************
This program uses Vader SentimentIntensityAnalyzer to calculate a ticker/token compound value.
Limitations:
It depends mainly on the defined parameters for current implementation:
It completely ignores the heavily down voted comments, and there can be a time when
the most mentioned ticker is heavily down voted, but you can change that in upvotes variable.
****************************************************************************'''
start_time = time.time()
reddit = praw.Reddit(
user_agent="Comment Extraction",
# replace with information from your Reddit account
client_id="xxxxxxxxxxxxxxxxxxxxxx",
client_secret="xxxxxxxxxxxxxxxxxxx"
)
logging.info('logged into Reddit')
print('logged into Reddit')
def sentiment_reddit():
threading.Timer(300, sentiment_reddit).start()
#
#
# set the program parameters
#
#
subs = ['CryptoCurrency', 'CryptoMarkets', 'EthTrader', 'Investing', 'Crypto_General', 'Bitcoin', 'CryptoCurrencyTrading', 'Coinbase', 'wallstreetbets'] # sub-reddit to search
post_flairs = {'Daily Discussion', 'Weekend Discussion', 'Discussion'} # posts flairs to search || None flair is automatically considered
goodAuth = {'AutoModerator'} # authors whom comments are allowed more than once
uniqueCmt = True # allow one comment per author per symbol
ignoreAuthP = {'example'} # authors to ignore for posts
ignoreAuthC = {'example'} # authors to ignore for comment
upvoteRatio = 0.70 # upvote ratio for post to be considered, 0.70 = 70%
ups = 20 # define # of up votes, post is considered if up votes exceed this #
limit = 10 # define the limit, comments 'replace more' limit
upvotes = 2 # define # of up votes, comment is considered if up votes exceed this #
picks = 200 # define # of picks here, prints as "Top ## picks are:"
picks_ayz = 200 # define # of picks for sentiment analysis
posts, count, c_analyzed, tickers, titles, a_comments = 0, 0, 0, {}, [], {}
cmt_auth = {}
for sub in subs:
subreddit = reddit.subreddit(sub)
hot_python = subreddit.hot() # sorting posts by hot
# Extracting comments, symbols from subreddit
for submission in hot_python:
flair = submission.link_flair_text
author = submission.author
# checking: post up vote ratio # of up votes, post flair, and author
if submission.upvote_ratio >= upvoteRatio and submission.ups > ups and (
flair in post_flairs or flair is None) and author not in ignoreAuthP:
submission.comment_sort = 'new'
comments = submission.comments
titles.append(submission.title)
posts += 1
submission.comments.replace_more(limit=limit)
for comment in comments:
# try except for deleted account?
try:
auth = comment.author.name
except:
pass
c_analyzed += 1
# checking: comment up votes and author
if comment.score > upvotes and auth not in ignoreAuthC:
split = comment.body.split(" ")
for word in split:
word = word.replace("$", "")
# upper = ticker, length of ticker <= 5, excluded words,
if word.isupper() and len(word) <= 5 and word not in blacklist and word in crypto:
# unique comments, try/except for key errors
if uniqueCmt and auth not in goodAuth:
try:
if auth in cmt_auth[word]: break
except:
pass
# counting tickers
if word in tickers:
tickers[word] += 1
a_comments[word].append(comment.body)
cmt_auth[word].append(auth)
count += 1
else:
tickers[word] = 1
cmt_auth[word] = [auth]
a_comments[word] = [comment.body]
count += 1
# sorts the dictionary
symbols = dict(sorted(tickers.items(), key=lambda item: item[1], reverse=True))
top_picks = list(symbols.keys())[0:picks]
# time = (time.time() - start_time)
# print top picks
# print("It took {t:.2f} seconds to analyze {c} comments in {p} posts in {s} subreddits.\n".format(t=time, c=c_analyzed, p=posts, s=len(subs)))
print("Posts analyzed saved in titles")
# for i in titles: print(i) # prints the title of the posts analyzed
logging.info(top_picks)
print(f"\n{picks} most mentioned picks: ")
times = []
top = []
for i in top_picks:
print(f"{i}: {symbols[i]}")
times.append(symbols[i])
top.append(f"{i}: {symbols[i]}")
# Applying Sentiment Analysis
scores, s = {}, {}
vader = SentimentIntensityAnalyzer()
# adding custom words from config
vader.lexicon.update(new_words)
picks_sentiment = list(symbols.keys())[0:picks_ayz]
for symbol in picks_sentiment:
stock_comments = a_comments[symbol]
for cmnt in stock_comments:
score = vader.polarity_scores(cmnt)
if symbol in s:
s[symbol][cmnt] = score
else:
s[symbol] = {cmnt: score}
if symbol in scores:
for key, _ in score.items():
scores[symbol][key] += score[key]
else:
scores[symbol] = score
# calculating avg.
for key in score:
scores[symbol][key] = scores[symbol][key] / symbols[symbol]
scores[symbol][key] = "{pol:.3f}".format(pol=scores[symbol][key])
# printing sentiment analysis
print(f"\nSentiment analysis of top {picks_ayz} picks:")
df = pd.DataFrame(scores)
df.index = ['Bearish', 'Neutral', 'Bullish', 'Total/Compound']
df = df.T
# log the dataframe
logging.info('dataframe head - {}'.format(df.to_string()))
print(df)
Improvements to the Python Reddit sentiment analysis
Take this code to the next level and modify the script above to implement additional enhancements:
- Generate a graph or report
- Modify the code to place a trade on the Ethereum blockchain
- email the data or make it visual on a website
This code is for learning and entertainment purposes only. The code has not been audited and use at your own risk. Remember smart contracts are experimental and could contain bugs.