CLFeb 25

Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text

arXiv:2602.21933v1h-index: 1
Originality Synthesis-oriented
AI Analysis

This addresses sarcasm detection for low-resource multilingual NLP, but it is incremental as it compares existing methods on a specific dataset.

This study tackled sarcasm detection in code-mixed Hinglish text by comparing large language models with a fine-tuned DistilBERT model, finding that the fine-tuned model achieved 84% accuracy, outperforming the LLMs.

Sarcasm detection in multilingual and code-mixed environments remains a challenging task for natural language processing models due to structural variations, informal expressions, and low-resource linguistic availability. This study compares four large language models, Llama 3.1, Mistral, Gemma 3, and Phi-4, with a fine-tuned DistilBERT model for sarcasm detection in code-mixed Hinglish text. The results indicate that the smaller, sequentially fine-tuned DistilBERT model achieved the highest overall accuracy of 84%, outperforming all of the LLMs in zero and few-shot set ups, using minimal LLM generated code-mixed data used for fine-tuning. These findings indicate that domain-adaptive fine-tuning of smaller transformer based models may significantly improve sarcasm detection over general LLM inference, in low-resource and data scarce settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes