CL AIFeb 10, 2025

RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyan Code-Switched Dataset

arXiv:2502.06180v18.36 citationsh-index: 8Has CodeWASSA

Originality Incremental advance

AI Analysis

This work addresses the challenge of sentiment and emotion detection for low-resource languages, specifically for Kenyan code-switched Twitter content, which is significant for individuals and organizations seeking to analyze opinions and emotions in this region.

The authors tackled the problem of sentiment and emotion detection in low-resource, user-generated Twitter content for Kenyan code-switched dataset, achieving the highest accuracy of 69.2% for sentiment analysis and 59.8% for emotion analysis. The results show that XLM-R and DistilBERT models outperform other models in their respective tasks.

Social media has become a crucial open-access platform for individuals to express opinions and share experiences. However, leveraging low-resource language data from Twitter is challenging due to scarce, poor-quality content and the major variations in language use, such as slang and code-switching. Identifying tweets in these languages can be difficult as Twitter primarily supports high-resource languages. We analyze Kenyan code-switched data and evaluate four state-of-the-art (SOTA) transformer-based pretrained models for sentiment and emotion classification, using supervised and semi-supervised methods. We detail the methodology behind data collection and annotation, and the challenges encountered during the data curation phase. Our results show that XLM-R outperforms other models; for sentiment analysis, XLM-R supervised model achieves the highest accuracy (69.2\%) and F1 score (66.1\%), XLM-R semi-supervised (67.2\% accuracy, 64.1\% F1 score). In emotion analysis, DistilBERT supervised leads in accuracy (59.8\%) and F1 score (31\%), mBERT semi-supervised (accuracy (59\% and F1 score 26.5\%). AfriBERTa models show the lowest accuracy and F1 scores. All models tend to predict neutral sentiment, with Afri-BERT showing the highest bias and unique sensitivity to empathy emotion. https://github.com/NEtori21/Ride_hailing

View on arXiv PDF Code

Similar