CVAILGFeb 15

When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance

arXiv:2602.14157v1
Originality Incremental advance
AI Analysis

This work provides a faster and more practical solution for image and video editing tasks, though it is incremental as it builds on prior approximations.

The paper tackled the problem of slow text-driven image and video editing in diffusion models by eliminating costly vector-Jacobian product computations, showing that test-time guidance alone achieves performance comparable to or better than training-based methods on large-scale benchmarks.

Text-driven image and video editing can be naturally cast as inpainting problems, where masked regions are reconstructed to remain consistent with both the observed content and the editing prompt. Recent advances in test-time guidance for diffusion and flow models provide a principled framework for this task; however, existing methods rely on costly vector--Jacobian product (VJP) computations to approximate the intractable guidance term, limiting their practical applicability. Building upon the recent work of Moufad et al. (2025), we provide theoretical insights into their VJP-free approximation and substantially extend their empirical evaluation to large-scale image and video editing benchmarks. Our results demonstrate that test-time guidance alone can achieve performance comparable to, and in some cases surpass, training-based methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes