Reddit sentiment indicator for crypto in Python

In this tutorial I will explain how to build a Reddit crypto currency sentiment indicator in Python. Sentiment analysis is the process of statistically determining if a piece of text is positive, negative or neutral. This program will scan Reddit for different crypto currencies and rank the keywords that are used in comments. Then it will determine if the sentiment is positive or negative.

The process will:

  1. Fetch the latest crypto tickers from CoinGecko
  2. Search Reddit for these crypto tickers
  3. VADER (a sentiment analysis) will check keywords in comments to determine if they are in a keyword lexicon (dictionary)
  4. Keywords identified in the lexicon will be graded as positive or negative
  5. Positive words have higher positive ratings and more negative words have lower negative ratings.

Reddit Sentiment Process

  • Configure Python code
  • Run code to generate sentiment analysis
  • Review results
Reddit sentiment indicator for crypto in Python

Python code to create sentiment analysis

First create a Python file and name it sentiment_reddit_template.py. Copy the code below into this file. This template file creates the configuration file needed for sentiment analysis. In this file modify the black list and new words list.

  • blacklist = This is an exclusion list of items that you do not want to include in your analysis
  • new_words = This list contains the words you want to rank and their sentiment. Positive numbers represent a positive sentiment and negative numbers represent a negative sentiment.
#sentiment_reddit_template.py
# use this template to generate the reddit sentiment config file which is used in the sentiment process
crypto = { {{ tickers }} }

# Exclude common words used on crypto reddit that are also crypto names
blacklist = {'I', 'WSB', 'THE', 'A', 'ROPE', 'YOLO', 'TOS', 'CEO', 'DD', 'IT', 'OPEN', 'ATH', 'PM', 'IRS', 'FOR','DEC', 'BE', 'IMO', 'ALL', 'RH', 'EV', 'TOS', 'CFO', 'CTO', 'DD', 'BTFD', 'WSB', 'OK', 'PDT', 'RH', 'KYS', 'FD', 'TYS', 'US', 'USA', 'IT', 'ATH', 'RIP', 'BMW', 'GDP', 'OTM', 'ATM', 'ITM', 'IMO', 'LOL', 'AM', 'BE', 'PR', 'PRAY', 'PT', 'FBI', 'SEC', 'GOD', 'NOT', 'POS', 'FOMO', 'TL;DR', 'EDIT', 'STILL', 'WTF', 'RAW', 'PM', 'LMAO', 'LMFAO', 'ROFL', 'EZ', 'RED', 'BEZOS', 'TICK', 'IS', 'PM', 'LPT', 'GOAT', 'FL', 'CA', 'IL', 'MACD', 'HQ', 'OP', 'PS', 'AH', 'TL', 'JAN', 'FEB', 'JUL', 'AUG', 'SEP', 'SEPT', 'OCT', 'NOV', 'FDA', 'IV', 'ER', 'IPO', 'MILF', 'BUT', 'SSN', 'FIFA', 'USD', 'CPU', 'AT', 'GG', 'Mar'}


# adding crypto reddit to vader to improve sentiment analysis, score: 4.0 to -4.0. Rank each keyword
# add new key words below that you would like to rank

new_words = {
    'lambo': 4.0,
    'rekt': -4.0,
    'citron': -4.0,
    'hidenburg': -4.0,
    'moon': 4.0,
    'Elon': 2.0,
    'hodl': 2.0,
    'highs': 2.0,
    'mooning': 4.0,
    'long': 2.0,
    'short': -2.0,
    'call': 4.0,
    'calls': 4.0,
    'put': -4.0,
    'puts': -4.0,
    'break': 2.0,
    'tendie': 2.0,
    'tendies': 2.0,
    'town': 2.0,
    'overvalued': -3.0,
    'undervalued': 3.0,
    'buy': 4.0,
    'sell': -4.0,
    'gone': -1.0,
    'gtfo': -1.7,
    'fomo': 2.0,
    'paper': -1.7,
    'bullish': 3.7,
    'bearish': -3.7,
    'bagholder': -1.7,
    'stonk': 1.9,
    'green': 1.9,
    'money': 1.2,
    'print': 2.2,
    'rocket': 2.2,
    'bull': 2.9,
    'bear': -2.9,
    'pumping': 1.0,
    'sus': -3.0,
    'offering': -2.3,
    'rip': -4.0,
    'downgrade': -3.0,
    'upgrade': 3.0,
    'maintain': 1.0,
    'pump': 1.9,
    'hot': 2,
    'drop': -2.5,
    'rebound': 1.5,
    'crack': 2.5, }

Second create a file called sentiment_reddit_ticker_generator.py. Copy the code below into this file. Use the code below to query Coin Gecko to get the latest list of tickers. This file will use the code above to generate the actual configuration file (sentiment_reddit_config.py) needed for the sentiment program. This process allows you to for update the ticker list on demand.

import urllib.request
import json
from jinja2 import Environment, FileSystemLoader


tickers = ''
for i in range(1, 3):
    # 250 tickets per page.  we are going to request 10 pages from Coingecko so that is 2,500 tickers
    endpoint = f'https://api.coingecko.com/api/v3/coins/markets?vs_currency=usd&order=market_cap_desc&per_page=250&page={i}&sparkline=false'

    with urllib.request.urlopen(endpoint) as url:
        # convert to a json
        data = json.loads(url.read().decode())
        for crypto in data:
            tickers = tickers + "'" + crypto['symbol'].upper() + "'" + ','
file_loader = FileSystemLoader('./')
env = Environment(loader=file_loader)
# dynamically create a file of tickers using the template file so it is in the correct format for the program
template = env.get_template('sentiment_reddit_template.py')
output = template.render(tickers=tickers)
# save the results in sentiment_reddit_config.py
with open("sentiment_reddit_config.py", "w") as fh:
    fh.write(output)

The code above will generate the actual configuration file named sentiment_reddit_config.py. This configuration file is used by the sentiment analysis process. It will contain the latest list of crypto tickers from Coin Gecko and sentiment words to rank.

# all crypto tickers fetched from coingecko
# this config file is used in the sentiment process
crypto = { 'BTC','ETH','BNB','XRP','USDT','ADA','DOGE','DOT','UNI','LTC','BCH','LINK','VET','USDC','XLM','SOL','THETA','FIL','TRX','WBTC','BUSD','XMR','LUNA','NEO','KLAY','MIOTA','EOS','AAVE','CAKE','ATOM','BSV','CRO','BTT','OKB','FTT','MATIC','ETC','CUSDC','CETH','XTZ','MKR','ALGO','AVAX','KSM','DAI','RUNE','CDAI','HT','COMP','EGLD','XEM','HOT','DASH','CHZ','DCR','ZEC','SNX','ZIL','HBAR','STX','ENJ','CEL','LEO','SUSHI','AMP','WAVES','NEXO','SC','UST','DGB','GRT','FEI','NEAR','BAT','MANA','YFI','BTG','ARRR','UMA','QTUM','HBTC','RVN','LUSD','ONT','ZRX','ICX','HNT','ONE','ZEN','WRX','AR','FTM','FLOW','BNT','IOST','RSR','OMI','XDC','DENT','NANO','CHSB','PAX','ANKR','WIN','OMG','KCS','VGX','PUNDIX','CRV','XSUSHI','HUSD','CFX','XVG','BCHA','NPXS','1INCH','REN','XVS','LSK','NXM','LPT','VTHO','SNT','OXY','LRC','STETH','BTMX','CKB','RENBTC','BAL','GT','MIR','OCEAN','BOT','CELO','ZKS','CUSDT','TRIBE','BTCST','QNT','RAY','AXS','SRM','EWT','BAND','REEF','GLM','WOO','STMX','NKN','CUNI','KNCL','BCD','MAID','TON','FET','OGN','TLM','DODO','MED','SKL','AMPL','ARDR','KIN','TEL','CELR','ETN','HBC','AGI','NMR','SAND','WAXP','AUDIO','SXP','ALPHA','RFOX','CVC','MDX','ORN','BAKE','STEEM','FUN','KMD','POLY','TUSD','ARK','BTM','BTS','NOIA','ORBS','META','IOTX','ANT','USDN','AKT','SETH','KAVA','KLV','STORJ','RPL','XHV','GNO','GHX','WAN','ANC','VLX','SWAP','ERSDL','BADGER','UOS','FORTH','AVA','UBT','UQC','ALCX','HNS','COTI','NWC','MTL','LEND','SFP','HIVE','TITAN','RDD','HTR','RUNE','MATH','UTK','INJ','VRA','PAID','REP','KAI','BUNNY','KOBE','IQ','ROSE','LINA','MONA','TWT','RIF','KEEP','ELF','DNT','STRAX','SCRT','SUPER','ZMT','OHM','TRAC','GAS','POLS','CZRX','RLC','QKC','SYS','SUSD','TKO','CTSI','CRU','MASK','WNXM','CRE','ATRI','PERP','POWR','TOMO','ERN','XOR','PPT','JST','ROOK','VAI','VRSC','ALICE','EPS','FX','MFT','DAG','AION','PHA','EXRD','EDG','ADX','NU','DIA','RGT','API3','LAMB','PRQ','CBAT','LIT','LYXE','BCN','STETH','AE','GALA','DPI','IRIS','RNDR','SHR','DDX','ELA','PAC','HXRO','MLN','GNY','XAUT','RAMP','LTO','POND','C20','DAO','XCM','CHR','TRB','TT','ERG','AKRO','MAPS','FIRO','AUCTION','ZNN','MX','QUICK','EMC2','NMX','NRG','LOOMOLD','BSCPAD','DIVI','LON','NULS','IGNIS','DSLA','KDA','WOZX','SRK','CTK','ALBT','BEL','BOA','BAR','LBC','USDP','BEAM','MXC','HOGE','VSP','FREE','DUCK','FRAX','SFI','SOLVE','BLZ','COL','SPI','RFR','SERO','RLY','GRS','ID','LOC','ALPACA','PSG','GUSD','BZRX','DATA','SAFEMARS','DRGN','TORN','OXEN','WICC','PAXG','PIVX','AETH','SURE','VITE','HARD','FARM','SLT','GET','REQ','BIFI','TVK','WHALE','YFII','PCX','OM','MRPH','AERGO','HYDRA','COS','DUSK','VSYS','NXS','STAKE','DERO','VXV','MHC','VTC','ARPA','YLD','SWTH','CHAIN','PNK','STPT','BFC','XPRT','APL','HC','YCC','RAI','PIB','TRU','LGO','ESD','NEST','CREAM','FRONT','PHB','SBTC','WOW','ARMOR','VAL','SUKU','RAD','VETH','FIDA','NIM','NRV','FEG','DEGO','LAYER','BOSON','FIO','BELT','IDEX','VISR','SPARTA','SWINGBY','PNT','NBR','ZERO','COPE','MITH','ZAI','FRM','CFI','PROM','FSN','DF','HELMET','DEXT','MTA','BAO','CND','AST','BMI','FXF','ECO','HAI','HEGIC','DG','CARDS','LQTY','KP3R','WING','RDN','AIOZ','RARI','DMT','TBTC','KYL','AUTO','MBL','DCN','BONDLY','FIS','BIP','NFTX','SKY','BDPI','FXS','NXT','BOR','GXC','UFT','RING','DOCK','CORE','INDEX','VID','DEXE','CONV','RCN','SBD','UNFI','GBYTE','ZEE', }

# Excludes common words and words used on crypto reddit that are also crypto names
blacklist = {'I', 'WSB', 'THE', 'A', 'ROPE', 'YOLO', 'TOS', 'CEO', 'DD', 'IT', 'OPEN', 'ATH', 'PM', 'IRS', 'FOR','DEC', 'BE', 'IMO', 'ALL', 'RH', 'EV', 'TOS', 'CFO', 'CTO', 'DD', 'BTFD', 'WSB', 'OK', 'PDT', 'RH', 'KYS', 'FD', 'TYS', 'US', 'USA', 'IT', 'ATH', 'RIP', 'BMW', 'GDP', 'OTM', 'ATM', 'ITM', 'IMO', 'LOL', 'AM', 'BE', 'PR', 'PRAY', 'PT', 'FBI', 'SEC', 'GOD', 'NOT', 'POS', 'FOMO', 'TL;DR', 'EDIT', 'STILL', 'WTF', 'RAW', 'PM', 'LMAO', 'LMFAO', 'ROFL', 'EZ', 'RED', 'BEZOS', 'TICK', 'IS', 'PM', 'LPT', 'GOAT', 'FL', 'CA', 'IL', 'MACD', 'HQ', 'OP', 'PS', 'AH', 'TL', 'JAN', 'FEB', 'JUL', 'AUG', 'SEP', 'SEPT', 'OCT', 'NOV', 'FDA', 'IV', 'ER', 'IPO', 'MILF', 'BUT', 'SSN', 'FIFA', 'USD', 'CPU', 'AT', 'GG', 'Mar'}


# adding crypto reddit to vader to improve sentiment analysis, score: 4.0 to -4.0. Rank each keyword

new_words = {
    'lambo': 4.0,
    'rekt': -4.0,
    'citron': -4.0,
    'hidenburg': -4.0,
    'moon': 4.0,
    'Elon': 2.0,
    'hodl': 2.0,
    'highs': 2.0,
    'mooning': 4.0,
    'long': 2.0,
    'short': -2.0,
    'call': 4.0,
    'calls': 4.0,
    'put': -4.0,
    'puts': -4.0,
    'break': 2.0,
    'tendie': 2.0,
    'tendies': 2.0,
    'town': 2.0,
    'overvalued': -3.0,
    'undervalued': 3.0,
    'buy': 4.0,
    'sell': -4.0,
    'gone': -1.0,
    'gtfo': -1.7,
    'fomo': 2.0,
    'paper': -1.7,
    'bullish': 3.7,
    'bearish': -3.7,
    'bagholder': -1.7,
    'stonk': 1.9,
    'green': 1.9,
    'money': 1.2,
    'print': 2.2,
    'rocket': 2.2,
    'bull': 2.9,
    'bear': -2.9,
    'pumping': 1.0,
    'sus': -3.0,
    'offering': -2.3,
    'rip': -4.0,
    'downgrade': -3.0,
    'upgrade': 3.0,
    'maintain': 1.0,
    'pump': 1.9,
    'hot': 2,
    'drop': -2.5,
    'rebound': 1.5,
    'crack': 2.5, }

Finally, create a file called sentiment_reddit.py. Copy the code below into this file. This file will scan Reddit and use the configuration file above to generate sentiment data. Log into your Reddit account and retrieve your client id and client secret. This will enable the program to log into reddit programmatically and scan subReddits for information.

  • client_id=”xxxxxxxxxxxxxxxxxxxxxxx”,
  • client_secret=”xxxxxxxxxxxxxxxxxxxxxxx”
import praw
import time
import pandas as pd
import logging
import threading
import matplotlib.pyplot as plt
import squarify
from sentiment_reddit_config import *
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk
nltk.downloader.download('vader_lexicon')


'''*****************************************************************************
This program uses Vader SentimentIntensityAnalyzer to calculate a ticker/token compound value.
Limitations:
It depends mainly on the defined parameters for current implementation:
It completely ignores the heavily down voted comments, and there can be a time when
the most mentioned ticker is heavily down voted, but you can change that in upvotes variable.
****************************************************************************'''

start_time = time.time()
reddit = praw.Reddit(
    user_agent="Comment Extraction",

# replace with information from your Reddit account

    client_id="xxxxxxxxxxxxxxxxxxxxxx",
    client_secret="xxxxxxxxxxxxxxxxxxx"
)
logging.info('logged into Reddit')
print('logged into Reddit')


def sentiment_reddit():
    threading.Timer(300, sentiment_reddit).start()

#    
#
# set the program parameters
#
#
    subs = ['CryptoCurrency', 'CryptoMarkets', 'EthTrader', 'Investing', 'Crypto_General', 'Bitcoin', 'CryptoCurrencyTrading', 'Coinbase', 'wallstreetbets']  # sub-reddit to search
    post_flairs = {'Daily Discussion', 'Weekend Discussion', 'Discussion'}  # posts flairs to search || None flair is automatically considered
    goodAuth = {'AutoModerator'}  # authors whom comments are allowed more than once
    uniqueCmt = True  # allow one comment per author per symbol
    ignoreAuthP = {'example'}  # authors to ignore for posts
    ignoreAuthC = {'example'}  # authors to ignore for comment
    upvoteRatio = 0.70  # upvote ratio for post to be considered, 0.70 = 70%
    ups = 20  # define # of up votes, post is considered if up votes exceed this #
    limit = 10  # define the limit, comments 'replace more' limit
    upvotes = 2  # define # of up votes, comment is considered if up votes exceed this #
    picks = 200  # define # of picks here, prints as "Top ## picks are:"
    picks_ayz = 200  # define # of picks for sentiment analysis

    posts, count, c_analyzed, tickers, titles, a_comments = 0, 0, 0, {}, [], {}
    cmt_auth = {}

    for sub in subs:
        subreddit = reddit.subreddit(sub)
        hot_python = subreddit.hot()  # sorting posts by hot
        # Extracting comments, symbols from subreddit
        for submission in hot_python:
            flair = submission.link_flair_text
            author = submission.author

            # checking: post up vote ratio # of up votes, post flair, and author
            if submission.upvote_ratio >= upvoteRatio and submission.ups > ups and (
                    flair in post_flairs or flair is None) and author not in ignoreAuthP:
                submission.comment_sort = 'new'
                comments = submission.comments
                titles.append(submission.title)
                posts += 1
                submission.comments.replace_more(limit=limit)
                for comment in comments:
                    # try except for deleted account?
                    try:
                        auth = comment.author.name
                    except:
                        pass
                    c_analyzed += 1

                    # checking: comment up votes and author
                    if comment.score > upvotes and auth not in ignoreAuthC:
                        split = comment.body.split(" ")
                        for word in split:
                            word = word.replace("$", "")
                            # upper = ticker, length of ticker <= 5, excluded words,
                            if word.isupper() and len(word) <= 5 and word not in blacklist and word in crypto:

                                # unique comments, try/except for key errors
                                if uniqueCmt and auth not in goodAuth:
                                    try:
                                        if auth in cmt_auth[word]: break
                                    except:
                                        pass

                                # counting tickers
                                if word in tickers:
                                    tickers[word] += 1
                                    a_comments[word].append(comment.body)
                                    cmt_auth[word].append(auth)
                                    count += 1
                                else:
                                    tickers[word] = 1
                                    cmt_auth[word] = [auth]
                                    a_comments[word] = [comment.body]
                                    count += 1

                                # sorts the dictionary
    symbols = dict(sorted(tickers.items(), key=lambda item: item[1], reverse=True))
    top_picks = list(symbols.keys())[0:picks]
    # time = (time.time() - start_time)

    # print top picks
    # print("It took {t:.2f} seconds to analyze {c} comments in {p} posts in {s} subreddits.\n".format(t=time, c=c_analyzed, p=posts, s=len(subs)))
    print("Posts analyzed saved in titles")
    # for i in titles: print(i)  # prints the title of the posts analyzed
    logging.info(top_picks)
    print(f"\n{picks} most mentioned picks: ")
    times = []
    top = []
    for i in top_picks:
        print(f"{i}: {symbols[i]}")
        times.append(symbols[i])
        top.append(f"{i}: {symbols[i]}")

    # Applying Sentiment Analysis
    scores, s = {}, {}

    vader = SentimentIntensityAnalyzer()
    # adding custom words from config
    vader.lexicon.update(new_words)

    picks_sentiment = list(symbols.keys())[0:picks_ayz]
    for symbol in picks_sentiment:
        stock_comments = a_comments[symbol]
        for cmnt in stock_comments:
            score = vader.polarity_scores(cmnt)
            if symbol in s:
                s[symbol][cmnt] = score
            else:
                s[symbol] = {cmnt: score}
            if symbol in scores:
                for key, _ in score.items():
                    scores[symbol][key] += score[key]
            else:
                scores[symbol] = score

        # calculating avg.
        for key in score:
            scores[symbol][key] = scores[symbol][key] / symbols[symbol]
            scores[symbol][key] = "{pol:.3f}".format(pol=scores[symbol][key])

    # printing sentiment analysis
    print(f"\nSentiment analysis of top {picks_ayz} picks:")
    df = pd.DataFrame(scores)
    df.index = ['Bearish', 'Neutral', 'Bullish', 'Total/Compound']
    df = df.T
    # log the dataframe
    logging.info('dataframe head - {}'.format(df.to_string()))
    print(df)

Improvements to the Python Reddit sentiment analysis

Take this code to the next level and modify the script above to implement additional enhancements:

This code is for learning and entertainment purposes only. The code has not been audited and use at your own risk. Remember smart contracts are experimental and could contain bugs.

Next – How to airdrop crypto to multiple accounts using Python

Leave a Reply