CL AIMar 16

MetaKE: Meta-learning Aligned Knowledge Editing via Bi-level Optimization

arXiv:2603.1267784.2h-index: 1

AI Analysis

This work addresses a critical bottleneck in precisely editing knowledge in LLMs without disrupting general capabilities, offering a novel approach with potential broad impact in AI safety and model maintenance.

The paper tackles the problem of knowledge editing in Large Language Models, where existing methods suffer from a semantic-execution disconnect, and proposes MetaKE, a bi-level optimization framework that aligns edit targets with the model's feasible manifold, achieving significant performance improvements over baselines.

Knowledge editing (KE) aims to precisely rectify specific knowledge in Large Language Models (LLMs) without disrupting general capabilities. State-of-the-art methods suffer from an open-loop control mismatch. We identify a critical "Semantic-Execution Disconnect": the semantic target is derived independently without feedback from the downstream's feasible region. This misalignment often causes valid semantic targets to fall within the prohibited space, resulting in gradient truncation and editing failure. To bridge this gap, we propose MetaKE (Meta-learning Aligned Knowledge Editing), a new framework that reframes KE as a bi-level optimization problem. Departing from static calculation, MetaKE treats the edit target as a learnable meta-parameter: the upper-level optimizer seeks a feasible target to maximize post-edit performance, while the lower-level solver executes the editing. To address the challenge of differentiating through complex solvers, we derive a Structural Gradient Proxy, which explicitly backpropagates editability constraints to the target learning phase. Theoretical analysis demonstrates that MetaKE automatically aligns the edit direction with the model's feasible manifold. Extensive experiments confirm that MetaKE significantly outperforms strong baselines, offering a new perspective on knowledge editing.

View on arXiv PDF

Similar