CLOct 25, 2025

Irony Detection in Urdu Text: A Comparative Study Using Machine Learning Models and Large Language Models

Fiaz Ahmad, Nisar Hussain, Amna Qasim, Momina Hafeez, Muhammad Usman Grigori Sidorov, Alexander Gelbukh

arXiv:2510.22356v1h-index: 21

Originality Synthesis-oriented

AI Analysis

This addresses irony detection for Urdu, a low-resource language, but is incremental as it applies existing methods to new data.

The paper tackled irony detection in Urdu by translating an English corpus and evaluating machine learning models and large language models, with Gradient Boosting achieving 89.18% F1-score and LLaMA 3 (8B) achieving 94.61% F1-score.

Ironic identification is a challenging task in Natural Language Processing, particularly when dealing with languages that differ in syntax and cultural context. In this work, we aim to detect irony in Urdu by translating an English Ironic Corpus into the Urdu language. We evaluate ten state-of-the-art machine learning algorithms using GloVe and Word2Vec embeddings, and compare their performance with classical methods. Additionally, we fine-tune advanced transformer-based models, including BERT, RoBERTa, LLaMA 2 (7B), LLaMA 3 (8B), and Mistral, to assess the effectiveness of large-scale models in irony detection. Among machine learning models, Gradient Boosting achieved the best performance with an F1-score of 89.18%. Among transformer-based models, LLaMA 3 (8B) achieved the highest performance with an F1-score of 94.61%. These results demonstrate that combining transliteration techniques with modern NLP models enables robust irony detection in Urdu, a historically low-resource language.

View on arXiv PDF

Similar