CLMay 29, 2025

ScEdit: Script-based Assessment of Knowledge Editing

Xinye Li, Zunwen Zheng, Qian Zhang, Dekai Zhuang, Jiabao Kang, Liyan Xu, Qingbin Liu, Xi Chen, Zhiying Tu, Dianhui Chu, Dianbo Sui

arXiv:2505.23291v28.33 citationsh-index: 7Has CodeACL

Originality Synthesis-oriented

AI Analysis

This addresses the evaluation gap for knowledge editing methods in practical scenarios like LLM-as-agent applications, though it's an incremental contribution focused on benchmarking rather than new methods.

The authors tackled the problem that current knowledge editing evaluation frameworks are too simple and don't reflect real-world applications, so they created ScEdit, a script-based benchmark that extends evaluation from fact-based to action-based questions. They found all existing knowledge editing methods show performance drops on their benchmark, with established metrics declining and text-level metrics posing particular challenges.

Knowledge Editing (KE) has gained increasing attention, yet current KE tasks remain relatively simple. Under current evaluation frameworks, many editing methods achieve exceptionally high scores, sometimes nearing perfection. However, few studies integrate KE into real-world application scenarios (e.g., recent interest in LLM-as-agent). To support our analysis, we introduce a novel script-based benchmark -- ScEdit (Script-based Knowledge Editing Benchmark) -- which encompasses both counterfactual and temporal edits. We integrate token-level and text-level evaluation methods, comprehensively analyzing existing KE techniques. The benchmark extends traditional fact-based ("What"-type question) evaluation to action-based ("How"-type question) evaluation. We observe that all KE methods exhibit a drop in performance on established metrics and face challenges on text-level metrics, indicating a challenging task. Our benchmark is available at https://github.com/asdfo123/ScEdit.

View on arXiv PDF Code

Similar