Sentiment analysis is not solved! Assessing and probing sentiment classification
This work addresses the need for better qualitative analysis in sentiment analysis, providing a resource to evaluate and improve classifiers, though it is incremental as it builds on existing methods.
The authors tackled the problem of identifying remaining challenges in sentiment analysis by creating a dataset of sentences misclassified by state-of-the-art classifiers, annotated for 18 linguistic phenomena like negation and sarcasm, and demonstrated its use for probing classifier performance.
Neural methods for SA have led to quantitative improvements over previous approaches, but these advances are not always accompanied with a thorough analysis of the qualitative differences. Therefore, it is not clear what outstanding conceptual challenges for sentiment analysis remain. In this work, we attempt to discover what challenges still prove a problem for sentiment classifiers for English and to provide a challenging dataset. We collect the subset of sentences that an (oracle) ensemble of state-of-the-art sentiment classifiers misclassify and then annotate them for 18 linguistic and paralinguistic phenomena, such as negation, sarcasm, modality, etc. The dataset is available at https://github.com/ltgoslo/assessing_and_probing_sentiment. Finally, we provide a case study that demonstrates the usefulness of the dataset to probe the performance of a given sentiment classifier with respect to linguistic phenomena.