CL LGMay 6, 2021

On the logistical difficulties and findings of Jopara Sentiment Analysis

Marvin M. Agüero-Torales, David Vilares, Antonio G. López-Herrera

arXiv:2105.02947v231.7727 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses sentiment analysis for a low-resource code-switching language, which is an incremental contribution to NLP for specific linguistic communities.

This paper tackles sentiment analysis for Jopara, a code-switching language between Guarani and Spanish, by collecting a corpus of Guarani-dominant tweets and comparing neural and traditional machine learning models. Transformer architectures achieved the best results, though traditional models performed closely due to the low-resource nature of the problem.

This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, including pre-trained language models, and explore whether they perform better than traditional machine learning ones in this low-resource setup. Transformer architectures obtain the best results, despite not considering Guarani during pre-training, but traditional machine learning models perform close due to the low-resource nature of the problem.

View on arXiv PDF Code

Similar