CLMay 18, 2025

What Are They Talking About? A Benchmark of Knowledge-Grounded Discussion Summarization

Weixiao Zhou, Junnan Zhu, Gengyao Li, Xianfu Cheng, Xinnian Liang, Feifei Zhai, Zhoujun Li

arXiv:2505.12474v34.91 citationsh-index: 16Has CodeIJCNLP-AACL

Originality Incremental advance

AI Analysis

This addresses the problem of generating clear summaries for discussions with shared background knowledge, which is incremental as it builds on traditional dialogue summarization by adding knowledge grounding.

The authors tackled the problem of summarizing discussions that rely on shared background knowledge, where traditional methods produce confusing summaries due to omitted context and implicit references, by introducing Knowledge-Grounded Discussion Summarization (KGDS) and constructing the first benchmark with news-discussion pairs and expert annotations. Their evaluation of 12 advanced LLMs showed that KGDS remains a significant challenge, with models frequently missing key facts and failing to resolve implicit references.

Traditional dialogue summarization primarily focuses on dialogue content, assuming it comprises adequate information for a clear summary. However, this assumption often fails for discussions grounded in shared background, where participants frequently omit context and use implicit references. This results in summaries that are confusing to readers unfamiliar with the background. To address this, we introduce Knowledge-Grounded Discussion Summarization (KGDS), a novel task that produces a supplementary background summary for context and a clear opinion summary with clarified references. To facilitate research, we construct the first KGDS benchmark, featuring news-discussion pairs and expert-created multi-granularity gold annotations for evaluating sub-summaries. We also propose a novel hierarchical evaluation framework with fine-grained and interpretable metrics. Our extensive evaluation of 12 advanced large language models (LLMs) reveals that KGDS remains a significant challenge. The models frequently miss key facts and retain irrelevant ones in background summarization, and often fail to resolve implicit references in opinion summary integration.

View on arXiv PDF Code

Similar