VADER: A Parsimonious Rule-based Model

A few weeks ago came across a sentiment analysis python package known as Vader. The lexicon and rule-based model has been used for text sentiment analysis that is sensitive to both polarity and emotional intensity. It achieves sentiment analysis by combining qualitative analysis, human-centric approach, and empirical validation using human raters.

With Vader, you can perform a sentiment analysis easily even if you don’t have a positive or negative text example to train the classifier or write custom code to search for specific words in a sentiment wordlist.

It’s easy to understand how Vader works, how it maps the lexical features to combine the intensity. Before digging into this, however, let us first take a look at sentiment analysis.

So, what is sentiment analysis?

Sentiment analysis is a process of statistically determining the polarity of a text. Text must be either positive, negative, or neutral, and to determine this, you must take polarity based or valence based form to analyze it. Polarity form identifies whether a text is negative or positive and valence form shows the intensity of the text. For example, when a polarity based is used, good and great are measured as the same, but when you use valence form great is more intense than good.

Sentiment analysis is very useful for human-machine interaction and researchers and has very many benefits. An example of its advantages is where companies provide a platform for customers to leave their feedback about their product or services. The feedbacks are then analyzed, and the result is used to help the company rate the products market performance and understand where to improve on it.

How does Vader achieve sentiment analysis?

Vader uses a lexical approach, that is, it uses a dictionary of sentiment ‘lexicon’ to analyze a text. Each lexicon is rated on the degree of how negative or positive it is. Vader analyze each piece of a statement to see if there are words in it present in the lexicon. The existing words are then rated according to their sentiment degree to measure the emotional intensity.

How? You might be wondering, how can someone even measure emotions?

This is easy using lexical features, I mean, anything you use to communicate through texting is taken into consideration.

Some messages might look neutral if you don’t take into account the used acronyms, emoticons or punctuation marks. Take a tweet for example, you can find a text that looks completely neutral, but its emoticons, slangs or acronyms change the text emotional intensity. This shows how lack of current colloquialism can fail to note the sentiment of a text. Vader always handles this by including them in its lexicon to map the intensity value of the text.

Some words might not seem negative to you but might be very offensive to someone else. Well, this is yet another problem that must be put into consideration when measuring emotional intensity. To counter this, Vader sentiment analysis creators took a number of human raters and used the average rating of each word. Other things in a text can impart the emotions of a statement. This includes the contextual element like capitalization, punctuations, and modifiers. Vader sentiment analysis considers all this using five simple heuristics that are quantified using human raters ensuring there are the least possible errors in the analysis.

Conclusion

Vader sentiment analysis uses the lexical feature dictionary to sentiment score with a set of five heuristics. It is an excellent tool for analyzing social media texts and movie review and opinion articles.

You can always find the open-source implementation in python on GitHub and always enjoy the sentiment analysis using the code.

 

A project based on Vader Sentiment Analysis is in the works. Stay tuned 🙂 

 

Leave a comment