LGCLOct 16, 2024

Sarcasm Detection in a Less-Resourced Language

arXiv:2410.12704v12 citationsh-index: 1Proceedings of Slovenian Conference on Artificial Intelligence 2024
Originality Synthesis-oriented
AI Analysis

This work addresses sarcasm detection for less-resourced languages, but it is incremental as it applies existing methods to a new dataset.

The paper tackled sarcasm detection in Slovenian, a less-resourced language, by using machine translation and large language models, achieving an F1-score of 0.765 with an ensemble approach, close to human annotator agreement.

The sarcasm detection task in natural language processing tries to classify whether an utterance is sarcastic or not. It is related to sentiment analysis since it often inverts surface sentiment. Because sarcastic sentences are highly dependent on context, and they are often accompanied by various non-verbal cues, the task is challenging. Most of related work focuses on high-resourced languages like English. To build a sarcasm detection dataset for a less-resourced language, such as Slovenian, we leverage two modern techniques: a machine translation specific medium-size transformer model, and a very large generative language model. We explore the viability of translated datasets and how the size of a pretrained transformer affects its ability to detect sarcasm. We train ensembles of detection models and evaluate models' performance. The results show that larger models generally outperform smaller ones and that ensembling can slightly improve sarcasm detection performance. Our best ensemble approach achieves an $\text{F}_1$-score of 0.765 which is close to annotators' agreement in the source language.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes