On the logistical difficulties and findings of Jopara Sentiment Analysis
This work addresses sentiment analysis for a low-resource code-switching language, which is an incremental contribution to NLP for specific linguistic communities.
This paper tackles sentiment analysis for Jopara, a code-switching language between Guarani and Spanish, by collecting a corpus of Guarani-dominant tweets and comparing neural and traditional machine learning models. Transformer architectures achieved the best results, though traditional models performed closely due to the low-resource nature of the problem.
This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, including pre-trained language models, and explore whether they perform better than traditional machine learning ones in this low-resource setup. Transformer architectures obtain the best results, despite not considering Guarani during pre-training, but traditional machine learning models perform close due to the low-resource nature of the problem.