CLDec 20, 2024

Fearful Falcons and Angry Llamas: Emotion Category Annotations of Arguments by Humans and LLMs

arXiv:2412.15993v214 citationsh-index: 32Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding emotional influences in arguments for researchers in computational linguistics and argumentation, though it is incremental as it extends existing binary emotionality studies to discrete categories.

The study tackled the lack of discrete emotion category annotations in arguments by crowdsourcing subjective annotations in a German argument corpus and evaluating LLM-based labeling methods, finding that emotion categories enhance emotionality prediction but automatic methods show high recall and low precision for anger and fear, indicating a bias toward negative emotions.

Arguments evoke emotions, influencing the effect of the argument itself. Not only the emotional intensity but also the category influence the argument's effects, for instance, the willingness to adapt stances. While binary emotionality has been studied in arguments, there is no work on discrete emotion categories (e.g., "Anger") in such data. To fill this gap, we crowdsource subjective annotations of emotion categories in a German argument corpus and evaluate automatic LLM-based labeling methods. Specifically, we compare three prompting strategies (zero-shot, one-shot, chain-of-thought) on three large instruction-tuned language models (Falcon-7b-instruct, Llama-3.1-8B-instruct, GPT-4o-mini). We further vary the definition of the output space to be binary (is there emotionality in the argument?), closed-domain (which emotion from a given label set is in the argument?), or open-domain (which emotion is in the argument?). We find that emotion categories enhance the prediction of emotionality in arguments, emphasizing the need for discrete emotion annotations in arguments. Across all prompt settings and models, automatic predictions show a high recall but low precision for predicting anger and fear, indicating a strong bias toward negative emotions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes