CL AI LGJun 1, 2025

COMPKE: Complex Question Answering under Knowledge Editing

Keyuan Cheng, Zijian Kan, Zhixian He, Zhuoran Zhang, Muhammad Asif Ali, Ke Xu, Lijie Hu, Di Wang

arXiv:2506.00829v212.06 citationsh-index: 29Has CodeACL

Originality Incremental advance

AI Analysis

This work addresses the gap in assessing how well updated models apply edited knowledge in complex reasoning tasks, which is important for researchers and practitioners in AI and NLP, though it is incremental as it builds on existing knowledge editing benchmarks.

The authors tackled the problem of evaluating knowledge editing in large language models by introducing COMPKE, a benchmark with 11,924 complex questions that reflect real-life scenarios, and found that the effectiveness of editing methods varies significantly across models, with accuracies ranging from 39.47 to 3.83.

Knowledge Editing, which efficiently modifies the knowledge in large language models, has gathered great attention. Current benchmarks primarily use multi-hop question answering to assess and analyze newly injected or updated knowledge. However, we argue that these benchmarks fail to effectively evaluate how well the updated models apply this knowledge in real-life scenarios, particularly when questions require complex reasoning, involving one-to-many relationships or multi-step logical intersections. To fill in this gap, we introduce a new benchmark, COMPKE: Complex Question Answering under Knowledge Editing, which includes 11,924 complex questions that reflect real-life situations. We conduct an extensive evaluation of four knowledge editing methods on COMPKE, revealing that their effectiveness varies notably across different models. For instance, MeLLo attains an accuracy of 39.47 on GPT-4O-MINI, but this drops sharply to 3.83 on QWEN2.5-3B. We further investigate the underlying causes of these disparities from both methodological and model-specific perspectives. The datasets are available at https://github.com/kzjkzj666/CompKE.

View on arXiv PDF Code

Similar