Twitter Sentiment Analysis using R

A detailed sentiment analysis of Karnataka State Elections 2018 in India and gauge its impact on the final results. The entire summary of the project can be found in the presentation. The code has been reviewed by Rachael Tatman, Data Scientist at Kaggle, and can be watched on YouTube.

General info

The purpose of the project is to understand how we can extract tweets from Twitter and perform sentiment analysis on it. I find Indian politics very fascinating, and since the elections of one of the major state were coming up, I thought it would be interesting to understand how the sentiments are playing for the major parties and their leaders.

Screenshots

Technologies and Tools

R - version 3.4
Microsoft Excel

Setup

The steps to extract tweets from Twitter with specific filters can be found in data extraction. Two files of positive and negative words have been used to classify the sentiment of the tweets and can be found here. The code can be used to replicate the results.

Code Examples

Some examples of usage:

ktk <- searchTwitter('#karnatka', n=no.of.tweets, lang="en")
ktk2 <- searchTwitter('#karnatkaelections2018', n=no.of.tweets, lang="en")
ktk3 <- searchTwitter('#karnatkaelection', n=no.of.tweets, lang="en")
ktk4 <- searchTwitter('#battleforkarnatka', n=no.of.tweets, lang="en")
ktk5 <- searchTwitter('#karnatkakurukshetra', n=no.of.tweets, lang="en")
ktk6 <- searchTwitter('#KarnatkaAssembly', n=no.of.tweets, lang="en")
ktk7 <- searchTwitter('#karnatkavoting', n=no.of.tweets, lang="en")
ktk8 <- searchTwitter('#karnatkapolling', n=no.of.tweets, lang="en")
bjp <- searchTwitter('bjp', n=10000, lang="en")
congress <- searchTwitter('congress', n=2000, lang="en")
namo <- searchTwitter('narendra modi', n=2000, lang="en")
raga <- searchTwitter('rahul gandhi', n=2000, lang="en")

core.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
  require(plyr)
  require(stringr)
  
  # we got a vector of sentences. plyr will handle a list
  # or a vector as an "l" for us
  # we want a simple array ("a") of scores back, so we use 
  # "l" + "a" + "ply" = "laply":
  
  scores = laply(sentences, function(sentence, pos.words, neg.words) {
    
    # clean up sentences with R's regex-driven global substitute, gsub():
    sentence = gsub('[[:punct:]]', '', sentence)
    sentence = gsub('[[:cntrl:]]', '', sentence)
    sentence = gsub('\\d+', '', sentence)
    # and convert to lower case:
    sentence = tolower(sentence)
    
    # split into words. str_split is in the stringr package
    word.list = str_split(sentence, '\\s+')
    # sometimes a list() is one level of hierarchy too much
    words = unlist(word.list)
    
    # compare our words to the dictionaries of positive & negative terms
    pos.matches = match(words, pos.words)
    neg.matches = match(words, neg.words)
    
    # match() returns the position of the matched term or NA
    # we just want a TRUE/FALSE:
    pos.matches = !is.na(pos.matches)
    neg.matches = !is.na(neg.matches)
    
    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
    score = sum(pos.matches) - sum(neg.matches)
    
    return(score)
  }, pos.words, neg.words, .progress=.progress )
  
  scores.df = data.frame(score=scores, text=sentences)
  return(scores.df)
}

## Narendra Modi
namog <- ldply(namo,function(t) t$toDataFrame() )
result1 <- score.sentiment(namog$text,pos.words,neg.words)
summary(result1$score)
hist(result1$score,col = 'dark orange', main = 'Sentiment Analysis for Narendra Modi ', ylab = 'Count of tweets')
count(result1$score)
library(xlsx)
write.xlsx(result1, "myResults.xlsx")

Features

The sentiment analysis of the tweets was done successfully. The sentiment, however, does not indicate much deviations towards any particular party or leader.

To-do list:

Extract tweets for the entire week of election to gauge a trend in the sentiments.
Expand the horizon of the tweets extracted by including more political parties, leaders, and more hashtags.

Status

Project is: finished, however, I would like to implement sentiment analysis in python using Natural Language Processing.

Inspiration

Indian news media is filled with umpteen number of analysis and statistics during elections. I always fancied everything displayed on the screen and wanted to try hands-on with some of the analysis done behind the screen.

Contact

Created by me.

If you loved what you read here and feel like we can collaborate to produce some exciting stuff, or if you just want to shoot a question, please feel free to connect with me on email, LinkedIn, or Twitter. My other projects can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
img		img
Data Extraction		Data Extraction
README.md		README.md
Sentiment Analysis of Karnatka State Elections 2018.pdf		Sentiment Analysis of Karnatka State Elections 2018.pdf
Sentiment Data Extraction.R		Sentiment Data Extraction.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

Data Extraction

Data Extraction

README.md

README.md

Sentiment Analysis of Karnatka State Elections 2018.pdf

Sentiment Analysis of Karnatka State Elections 2018.pdf

Sentiment Data Extraction.R

Sentiment Data Extraction.R

Repository files navigation

Twitter Sentiment Analysis using R

Table of contents

General info

Screenshots

Technologies and Tools

Setup

Code Examples

Features

Status

Inspiration

Contact

About

Releases

Packages

Contributors 2

Languages

harshbg/Twitter-Sentiment-Analysis-using-R

Folders and files

Latest commit

History

Repository files navigation

Twitter Sentiment Analysis using R

Table of contents

General info

Screenshots

Technologies and Tools

Setup

Code Examples

Features

Status

Inspiration

Contact

About

Topics

Resources

Stars

Watchers

Forks

Languages