CL LGJun 7, 2024

SUMIE: A Synthetic Benchmark for Incremental Entity Summarization

Eunjeong Hwang, Yichao Zhou, Beliz Gunel, James Bradley Wendt, Sandeep Tata

arXiv:2406.05079v112.920 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of maintaining accurate knowledge in AI systems for researchers and developers, though it is incremental as it focuses on dataset creation rather than a new method.

The paper tackles the lack of a dataset for testing incremental entity summarization in language models by introducing SUMIE, a synthetic benchmark that exposes real-world challenges, with state-of-the-art LLMs achieving an F1 score of only 80.4% on the task.

No existing dataset adequately tests how well language models can incrementally update entity summaries - a crucial ability as these models rapidly advance. The Incremental Entity Summarization (IES) task is vital for maintaining accurate, up-to-date knowledge. To address this, we introduce SUMIE, a fully synthetic dataset designed to expose real-world IES challenges. This dataset effectively highlights problems like incorrect entity association and incomplete information presentation. Unlike common synthetic datasets, ours captures the complexity and nuances found in real-world data. We generate informative and diverse attributes, summaries, and unstructured paragraphs in sequence, ensuring high quality. The alignment between generated summaries and paragraphs exceeds 96%, confirming the dataset's quality. Extensive experiments demonstrate the dataset's difficulty - state-of-the-art LLMs struggle to update summaries with an F1 higher than 80.4%. We will open source the benchmark and the evaluation metrics to help the community make progress on IES tasks.

View on arXiv PDF

Similar