CVAIOct 16, 2025

Coupled Diffusion Sampling for Training-Free Multi-View Image Editing

arXiv:2510.14981v15 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses multi-view consistency for 3D scene editing, but is incremental as it builds on existing diffusion methods.

The paper tackles the problem of multi-view consistent image editing with pre-trained 2D models, which often produce inconsistent edits across views. It proposes coupled diffusion sampling to enforce consistency without training, achieving results across three editing tasks.

We present an inference-time diffusion sampling method to perform multi-view consistent image editing using pre-trained 2D image editing models. These models can independently produce high-quality edits for each image in a set of multi-view images of a 3D scene or object, but they do not maintain consistency across views. Existing approaches typically address this by optimizing over explicit 3D representations, but they suffer from a lengthy optimization process and instability under sparse view settings. We propose an implicit 3D regularization approach by constraining the generated 2D image sequences to adhere to a pre-trained multi-view image distribution. This is achieved through coupled diffusion sampling, a simple diffusion sampling technique that concurrently samples two trajectories from both a multi-view image distribution and a 2D edited image distribution, using a coupling term to enforce the multi-view consistency among the generated images. We validate the effectiveness and generality of this framework on three distinct multi-view image editing tasks, demonstrating its applicability across various model architectures and highlighting its potential as a general solution for multi-view consistent editing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes