CL AIJan 20, 2020

Unsupervised Sentiment Analysis for Code-mixed Data

arXiv:2001.11384v10.816 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of poor sentiment analysis performance on code-mixed text for multilingual societies, representing a strong specific gain in a domain-specific area.

The paper tackles sentiment analysis for code-mixed text by introducing methods that use multilingual and cross-lingual embeddings to transfer knowledge from monolingual text, achieving a 3% absolute F1-score improvement over state-of-the-art on English-Spanish code-mixed sentiment analysis with scores of 0.58-0.62 F1 in zero-shot settings.

Code-mixing is the practice of alternating between two or more languages. Mostly observed in multilingual societies, its occurrence is increasing and therefore its importance. A major part of sentiment analysis research has been monolingual, and most of them perform poorly on code-mixed text. In this work, we introduce methods that use different kinds of multilingual and cross-lingual embeddings to efficiently transfer knowledge from monolingual text to code-mixed text for sentiment analysis of code-mixed text. Our methods can handle code-mixed text through a zero-shot learning. Our methods beat state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute 3\% F1-score. We are able to achieve 0.58 F1-score (without parallel corpus) and 0.62 F1-score (with parallel corpus) on the same benchmark in a zero-shot way as compared to 0.68 F1-score in supervised settings. Our code is publicly available.

View on arXiv PDF Code

Similar