Out-of-Distribution Generalization via Invariant Trajectories for Multimodal Large Language Model Editing
This addresses the issue of correcting incorrect or outdated knowledge in MLLMs, which is crucial for applications requiring robust cross-modal reasoning, though it appears incremental as it builds on existing editing methods by adapting them to multimodal contexts.
The paper tackles the problem of knowledge editing in multimodal large language models (MLLMs) by reformulating it as an out-of-distribution generalization challenge, proposing ODEdit to enhance editing reliability, locality, and generality through invariant trajectory learning.
Knowledge editing emerges as a crucial technique for efficiently correcting incorrect or outdated knowledge in large language models (LLM). Existing editing methods for unimodal LLM rely on a rigid parameter-to-output mapping, which causes causal-underfit and causal-overfit in cascaded reasoning for Multimodal LLM (MLLM). In this paper, we reformulate MLLM editing as an out-of-distribution (OOD) generalization problem, where the goal is to discern semantic shift with factual shift and thus achieve robust editing among diverse cross-modal prompting. The key challenge of this OOD problem lies in identifying invariant causal trajectories that generalize accurately while suppressing spurious correlations. To address it, we propose ODEdit, a plug-and-play invariant learning based framework that optimizes the tripartite OOD risk objective to simultaneously enhance editing reliability, locality, and generality.We further introduce an edit trajectory invariant learning method, which integrates a total variation penalty into the risk minimization objective to stabilize edit trajectories against environmental variations. Theoretical analysis and extensive experiments demonstrate the effectiveness of ODEdit.