CVAIETIVMar 10

When to Lock Attention: Training-Free KV Control in Video Diffusion

arXiv:2603.09657v185.0h-index: 14
Predicted impact top 22% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses a core problem in video editing for users of diffusion models, though it is incremental as it builds on existing DiT-based methods.

The paper tackles the challenge of maintaining background consistency while enhancing foreground quality in video editing by proposing KV-Lock, a training-free framework for DiT-based video diffusion models, which outperforms existing approaches in improved foreground quality with high background fidelity.

Maintaining background consistency while enhancing foreground quality remains a core challenge in video editing. Injecting full-image information often leads to background artifacts, whereas rigid background locking severely constrains the model's capacity for foreground generation. To address this issue, we propose KV-Lock, a training-free framework tailored for DiT-based video diffusion models. Our core insight is that the hallucination metric (variance of denoising prediction) directly quantifies generation diversity, which is inherently linked to the classifier-free guidance (CFG) scale. Building upon this, KV-Lock leverages diffusion hallucination detection to dynamically schedule two key components: the fusion ratio between cached background key-values (KVs) and newly generated KVs, and the CFG scale. When hallucination risk is detected, KV-Lock strengthens background KV locking and simultaneously amplifies conditional guidance for foreground generation, thereby mitigating artifacts and improving generation fidelity. As a training-free, plug-and-play module, KV-Lock can be easily integrated into any pre-trained DiT-based models. Extensive experiments validate that our method outperforms existing approaches in improved foreground quality with high background fidelity across various video editing tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes