CLMar 30, 2025

Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions

Mikhail Krasitskii, Olga Kolesnikova, Liliana Chanona Hernandez, Grigori Sidorov, Alexander Gelbukh

arXiv:2503.23295v117.613 citationsh-index: 21Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities

Originality Synthesis-oriented

AI Analysis

It addresses sentiment analysis for users of Tamil-English code-mixed language, but the work is incremental as it builds on existing transformer methods without introducing major innovations.

This paper tackles sentiment analysis in Tamil-English code-mixed texts by evaluating transformer models like XLM-RoBERTa and mT5, finding that specific models are effective in handling multilingual classification, though it notes challenges such as grammatical inconsistencies and dataset limitations.

The sentiment analysis task in Tamil-English code-mixed texts has been explored using advanced transformer-based models. Challenges from grammatical inconsistencies, orthographic variations, and phonetic ambiguities have been addressed. The limitations of existing datasets and annotation gaps have been examined, emphasizing the need for larger and more diverse corpora. Transformer architectures, including XLM-RoBERTa, mT5, IndicBERT, and RemBERT, have been evaluated in low-resource, code-mixed environments. Performance metrics have been analyzed, highlighting the effectiveness of specific models in handling multilingual sentiment classification. The findings suggest that further advancements in data augmentation, phonetic normalization, and hybrid modeling approaches are required to enhance accuracy. Future research directions for improving sentiment analysis in code-mixed texts have been proposed.

View on arXiv PDF

Similar