CL IR LGMar 6, 2020

Quality of Word Embeddings on Sentiment Analysis Tasks

arXiv:2003.03264v10.73 citations

Originality Synthesis-oriented

AI Analysis

This work provides practical guidance for selecting word embeddings in sentiment analysis, but it is incremental as it compares existing models without introducing new methods.

The study compared a dozen pretrained word embedding models on sentiment analysis tasks, finding that Twitter Tweets performed best on lyrics sentiment analysis, while Google News and Common Crawl were top on movie review polarity, with Glove models slightly outperforming Skipgram.

Word embeddings or distributed representations of words are being used in various applications like machine translation, sentiment analysis, topic identification etc. Quality of word embeddings and performance of their applications depends on several factors like training method, corpus size and relevance etc. In this study we compare performance of a dozen of pretrained word embedding models on lyrics sentiment analysis and movie review polarity tasks. According to our results, Twitter Tweets is the best on lyrics sentiment analysis, whereas Google News and Common Crawl are the top performers on movie polarity analysis. Glove trained models slightly outrun those trained with Skipgram. Also, factors like topic relevance and size of corpus significantly impact the quality of the models. When medium or large-sized text sets are available, obtaining word embeddings from same training dataset is usually the best choice.

View on arXiv PDF

Similar