CLMar 27

EnTaCs: Analyzing the Relationship Between Sentiment and Language Choice in English-Tamil Code-Switching

arXiv:2603.2658764.0h-index: 1
Predicted impact top 96% in CL · last 90 daysOriginality Incremental advance
AI Analysis

This provides empirical evidence for socio-linguistic theories about how emotion influences code-switching patterns in multilingual speakers.

This paper investigated how sentiment affects language choice in English-Tamil code-switched text, finding that positive utterances had 34.3% English proportion compared to 24.8% for negative utterances, and mixed-sentiment utterances showed the highest language switch frequency.

This paper investigates the relationship between utterance sentiment and language choice in English-Tamil code-switched text, using methods from machine learning and statistical modelling. We apply a fine-tuned XLM-RoBERTa model for token-level language identification on 35,650 romanized YouTube comments from the DravidianCodeMix dataset, producing per-utterance measurements of English proportion and language switch frequency. Linear regression analysis reveals that positive utterances exhibit significantly greater English proportion (34.3%) than negative utterances (24.8%), and mixed-sentiment utterances show the highest language switch frequency when controlling for utterance length. These findings support the hypothesis that emotional content demonstrably influences language choice in multilingual code-switching settings, due to socio-linguistic associations of prestige and identity with embedded and matrix languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes