LGJun 2, 2025

ThinkEval: Practical Evaluation of Knowledge Leakage in LLM Editing using Thought-based Knowledge Graphs

Manit Baser, Dinil Mon Divakaran, Mohan Gurusamy

arXiv:2506.01386v29.41 citationsh-index: 23Trans. Mach. Learn. Res.

Originality Incremental advance

AI Analysis

This addresses a critical issue for deploying LLMs in applications like healthcare, where updating knowledge is needed to prevent harm, but it is incremental as it builds on existing editing techniques.

The paper tackles the problem of indirect knowledge leakage in LLM editing, where edited-out information can be reconstructed through causal links, and presents ThinkEval to quantify this leakage, showing that five editing techniques struggle to balance suppression with knowledge preservation.

Robust model-editing techniques are essential for deploying large language models (LLMs) in practical applications, to enable cost-effective ways to deal with challenges such as privacy breaches, bias mitigation and misinformation spread. For example, an LLM-based healthcare assistance may need to update out-dated or incorrect knowledge to prevent harmful recommendations. However, many editing techniques focus on isolated facts, which critically fail to prevent indirect knowledge leakage -- the unintended reconstruction of edited-out information through persistent causal links and contextual relationships. To assist users in selecting the right editing technique, we develop and present ThinkEval, a framework to systematically quantify indirect knowledge leakage and ripple effects in model-editing. ThinkEval builds and employs specialized knowledge graphs to analyze the causal structure of facts before and after editing. To support this approach, we present KnowGIC, a benchmark dataset comprising multi-step reasoning paths that precisely measure these complex knowledge transformation effects. We evaluate five editing techniques: AlphaEdit, RECT, ROME, MEMIT, and PRUNE across multiple LLMs. Our results show that these techniques struggle to balance indirect fact suppression with the preservation of related knowledge, compromising the contextual integrity of a model's knowledge. Our dataset is available at: https://anonymous.4open.science/r/KnowGIC.

View on arXiv PDF

Similar