DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing

arXiv:2604.0796546.9h-index: 2

Predicted impact top 72% in CV · last 90 daysOriginality Highly original

AI Analysis

This addresses the challenge of precise, non-interfering knowledge updates for VLMs in continual learning scenarios, with incremental improvements over existing methods.

The paper tackles the problem of lifelong editing in Vision Language Models (VLMs), where sequential edits disrupt previously learned concepts due to entangled representations, by introducing Dynamic Subspace Concept Alignment (DSCA) to structurally isolate concepts in orthogonal semantic subspaces, achieving over 95% success after 1000 edits and reducing hallucination by 3-5%.

Model editing aims to update knowledge to add new concepts and change relevant information without retraining. Lifelong editing is a challenging task, prone to disrupting previously learned concepts, especially for Vision Language Models (VLMs), because sequential edits can lead to degraded reasoning and cross modal misalignment. Existing VLM knowledge editing methods based on gated adapters, activation edits, and parameter merging techniques address catastrophic forgetting seen in full fine tuning; however, they still operate in the shared representation space of the VLM, where concepts are entangled, so edits interfere with other non relevant concepts. We hypothesize that this instability persists because current methods algorithmically control edits via optimization rather than structurally separating knowledge. We introduce Dynamic Subspace Concept Alignment (DSCA) which by design mitigates this limitation by decomposing the representation space into a set of orthogonal semantic subspaces and proposing edits only in those transformed spaces. These subspaces are obtained through incremental clustering and PCA on joint vision language representations. This process structurally isolates concepts, enabling precise, non interfering edits by turning isolation from a soft training objective into an architectural property. The surgical edits are guided by a multi term loss function for maintaining task fidelity, edit locality, and cross modal alignment. With the base model frozen, our method achieves 98 percent single edit success, remains over 95 percent after 1000 sequential edits, lowers hallucination by 3 to 5 percent, and achieves the best backward transfer (BWT) scores on continual instruction tuning benchmarks. Extensive experiments demonstrate DSCA state of the art stability and knowledge retention capability in continual lifelong editing across various datasets and benchmarks.

View on arXiv PDF

Similar