CLJun 17, 2025

Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding

Yeonkyoung So, Gyuseong Lee, Sungmok Jung, Joonhak Lee, JiA Kang, Sangho Kim, Jaejin Lee

arXiv:2506.14397v210.93 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This addresses a fundamental linguistic challenge for LLM researchers, but it is incremental as it builds on existing work by focusing specifically on negation.

The authors tackled the lack of benchmarks for negation understanding in LLMs by introducing Thunder-NUBench, a novel benchmark that assesses sentence-level negation through diverse alternatives like local negation and contradiction, resulting in a manually curated dataset for evaluation.

Negation is a fundamental linguistic phenomenon that poses persistent challenges for Large Language Models (LLMs), particularly in tasks requiring deep semantic understanding. Existing benchmarks often treat negation as a side case within broader tasks like natural language inference, resulting in a lack of benchmarks that exclusively target negation understanding. In this work, we introduce Thunder-NUBench, a novel benchmark explicitly designed to assess sentence-level negation understanding in LLMs. Thunder-NUBench goes beyond surface-level cue detection by contrasting standard negation with structurally diverse alternatives such as local negation, contradiction, and paraphrase. The benchmark consists of manually curated sentence-negation pairs and a multiple-choice dataset that enables in-depth evaluation of models' negation understanding.

View on arXiv PDF

Similar