CVAIFeb 2

ReasonEdit: Editing Vision-Language Models using Human Reasoning

arXiv:2602.02408v2h-index: 8
AI Analysis

It addresses a practical gap in model editing for vision-language models, enabling more effective corrections in complex reasoning scenarios.

The paper tackles the problem of editing errors in vision-language models for reasoning-heavy tasks by introducing ReasonEdit, which incorporates human reasoning during editing and achieves state-of-the-art performance on multiple datasets.

Model editing aims to correct errors in large, pretrained models without altering unrelated behaviors. While some recent works have edited vision-language models (VLMs), no existing editors tackle reasoning-heavy tasks, which typically require humans and models to reason about images. We therefore propose ReasonEdit, the first VLM editor to let users explain their reasoning during editing, introducing a new, practical model editing setup. ReasonEdit continuously stores human reasoning in a codebook, and retrieves only relevant facts during inference using a novel topology-balanced multimodal embedding method inspired by network science. Across four VLMs on multiple rationale-based visual question answering datasets, ReasonEdit achieves state-of-the-art editing performance, ultimately showing that using human reasoning during editing greatly improves edit generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes