CL AI DLMar 28, 2025

Historical Ink: Exploring Large Language Models for Irony Detection in 19th-Century Spanish

Kevin Cohen, Laura Manrique-Gómez, Rubén Manrique

arXiv:2503.22585v117.612 citationsh-index: 2Has CodeProceedings of the 5th International Conference on Natural Language Processing for Digital Humanities

Originality Synthesis-oriented

AI Analysis

It addresses irony detection for historical Spanish language analysis, with incremental improvements in dataset creation and annotation methodology.

This study tackled irony detection in 19th-century Spanish texts by using large language models like BERT and GPT-4o, resulting in a new historical dataset and a semi-automated annotation method that improved class imbalance and annotation quality.

This study explores the use of large language models (LLMs) to enhance datasets and improve irony detection in 19th-century Latin American newspapers. Two strategies were employed to evaluate the efficacy of BERT and GPT-4o models in capturing the subtle nuances nature of irony, through both multi-class and binary classification tasks. First, we implemented dataset enhancements focused on enriching emotional and contextual cues; however, these showed limited impact on historical language analysis. The second strategy, a semi-automated annotation process, effectively addressed class imbalance and augmented the dataset with high-quality annotations. Despite the challenges posed by the complexity of irony, this work contributes to the advancement of sentiment analysis through two key contributions: introducing a new historical Spanish dataset tagged for sentiment analysis and irony detection, and proposing a semi-automated annotation methodology where human expertise is crucial for refining LLMs results, enriched by incorporating historical and cultural contexts as core features.

View on arXiv PDF Code

Similar