Text-mining proofs that vocabulary of rapper is significantly richer than that of Back Street Boys or Britney Spears
Natural Language Processing (NLP) is a computer science discipline that analyzes and extracts information from texts. Sentiment Analysis and Emotion Mining are sub-fields of this discipline. Sentiment analysis examines whether a text has a rather positive or negative undertone. Emotion Mining analyzes how much of a specific emotion the text expresses (e.g. fear, joy, sadness, etc.).
Sentiment Analysis and Emotion Mining have many possible commercial applications like helping customer care services and recommending music or movies to online shoppers. In (criminal) investigations, locating emotional communication based on key words expressing anger, cursing or threats can be a good starting point to find out what people want to cover up or hide.
In his project “Sentiment, Emotion & Vocabulary Analysis on Music Lyrics”, Thomas Vrancken conducted a sentiment and emotion analysis on 57 651 songs from 643 artists. For each artist, he assessed a positive and negative sentiment score, as well as a score for each of the eight main emotions joy, sadness, anger, fear, trust, disgust, anticipation, surprise. Both analyses used a tf-idf algorithm.
In addition, Vrancken conducted an analysis to determine the wideness of each artist’s vocabulary, based on the number of different words the singer uses.
Surprisingly, there is a strong positive correlation between joy and sadness. Apparently, artists who sing a lot about joy also express a lot of sadness. Intuitively, these results make sense. Artists (especially pop artists) that sing a lot about happy feelings also tend to produce more melancholic songs about love and heartaches. Conversely, most rock and rap artists reflect neither of these emotions in their songs.
Less surprisingly, these results show a negative correlation between joy and anger, but a strong positive correlation between anger and fear. The other correlation seem less relevant. These results statistically now prove this relation
Looking at that ranking for the vocabulary score, one can identify some artists that are known to be quite lyrical (e.g. Eminem, Wu-Tang Clan, etc). They also confirm the notion that hip-hop artists in general have a quite wide vocabulary.
Text-mining shows: rappers are more lyrical
However, the breaking ground results came from calculating regressions and correlations between these scores. These showed that: (1) Artists who express more negative sentiments than positive also tend to be more lyrical, i.e. have a wider vocabulary. (2) Artists that express joy and/or sadness tend to be less lyrical and use a much smaller vocabulary. (3) Artists that use words associated with rap music are more lyrical and have a significantly richer vocabulary than artists that use words linked with pop such as Backstreet Boys or Britney Spears.
These results bring statistical proof of many theories about music. For instance, that pop-artists do not bother to develop rich lyrics, whereas hip-hop artists do. By now, the reader probably wonders where does Justin Bieber stand in this list? Well: very close to bottom of the list with artist with the smallest vocabulary.
Thank you Thomas Vrancken for a great research project!