Eminem vs Miley Cyrus. NLP

We have a script for lyrics analysis and we have analyzed the lyrics of Miley Cyrus. Let’s check Eminem. Probably rappers have a more diverse vocabulary in their creations. We can expect different statistics.

Preprocessing

I’ve obtained (details here) and preprocessed the dataset in the same way except for excluding the text inside of square brackets: now anything inside them is deleting. It helps to exclude noisy tokens such as 2x.

Number of songs

Let’s look at how many albums and songs Eminem released. Number of Songs

Number of songs in the albums

First difference is that Eminem produced much more songs in his career (396 vs 160), but it can be explained by the length of his career. He released his first album 11 years earlier than Miley Cyrus.

word_counter = get_word_counter(lyrics_df.tokens)
print(len(word_counter), 'unique words')

> 12792 unique words

Remind you, in the previous dataset the vocabulary size is about 2100 unique words. In the case of Eminem this number is tremendous - 12792. That’s impressive! Let’s look at the ranges of the number of words and the number of unique words in the lyrics.

Number of words

Number of words in lyrics

Distribution of words count in albums

Number of words in dataset

Distribution of words count in the dataset

Most of the songs consist of 200 to 400 words, which is likewise several times higher than in the Miley Cyrus example (about 70 to 130). Moreover, there is a song with more than 1700 words (maximum for Miley Cyrus is about 280 words).

Number of words in lyrics

Distribution of unique words count in albums

Number of words in dataset

Distribution of unique words count in the dataset

Around half of the texts consist of 150 - 250 unique words. This number goes up to 500 words. There are some outliers on both sides of the chart. The maximum number of words in a lyric is about 1150. The shorter songs with less than 70 words can be explained by album-opening songs and songs featuring another artist.

Most common and rare

Let’s display the most frequent and common words (FYI, the list may contain offensive or rude words). Number of words in lyrics

Number of occurrences of the most common words

Number of words in lyrics

The number of occurrences of the most common words, counting only once per song

On the charts above you can see the most frequent and common words (words, which are found in many poetries). Expectedly, the top list includes some swear words, but it contains know, love, want, like as well. The most frequent ones know and like - the same for Miley Cyrus. Actually these words can be excluded during preprocessing. You can also define which words are most valuable using other algorithms or metrics (e.g. TFIDF). Let’s plot one more chart:

Correlation

Dependence of vocabulary size on the number of songs in the albums

The line plot shows the positive correlation of the variety of words in the album with the number of songs in that album. The more songs - the wider vocabulary. It says that Eminem tries to use different words in his poetries.

We have calculated some statistics in Eminem lyrics and compared the features with the lyrics of Miley Cyrus. We can’t confidently extend the findings to the entire pop and rap artists, but the results are still interesting.