CLJun 24, 2019

Embedding Projection for Targeted Cross-Lingual Sentiment: Model Comparisons and a Real-World Study

arXiv:1906.10519v10.715 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the lack of annotated data for fine-grained sentiment analysis in under-resourced languages, though it is incremental as it builds on existing projection-based methods.

The paper tackled the problem of sentiment analysis for under-resourced languages by proposing a cross-lingual model that incorporates sentiment into bilingual embeddings, achieving state-of-the-art performance in sentence-level tasks and outperforming other methods in targeted sentiment analysis across multiple domains.

Sentiment analysis benefits from large, hand-annotated resources in order to train and test machine learning models, which are often data hungry. While some languages, e.g., English, have a vast array of these resources, most under-resourced languages do not, especially for fine-grained sentiment tasks, such as aspect-level or targeted sentiment analysis. To improve this situation, we propose a cross-lingual approach to sentiment analysis that is applicable to under-resourced languages and takes into account target-level information. This model incorporates sentiment information into bilingual distributional representations, by jointly optimizing them for semantics and sentiment, showing state-of-the-art performance at sentence-level when combined with machine translation. The adaptation to targeted sentiment analysis on multiple domains shows that our model outperforms other projection-based bilingual embedding methods on binary targeted sentiment tasks. Our analysis on ten languages demonstrates that the amount of unlabeled monolingual data has surprisingly little effect on the sentiment results. As expected, the choice of annotated source language for projection to a target leads to better results for source-target language pairs which are similar. Therefore, our results suggest that more efforts should be spent on the creation of resources for less similar languages to those which are resource-rich already. Finally, a domain mismatch leads to a decreased performance. This suggests resources in any language should ideally cover varieties of domains.

View on arXiv PDF Code

Similar