CVOct 27, 2025

VALA: Learning Latent Anchors for Training-Free and Temporally Consistent

arXiv:2510.22970v1h-index: 5
Originality Incremental advance
AI Analysis

This addresses scalability and manual bias issues in video editing for users of pre-trained diffusion models, though it is incremental as it builds on existing training-free methods.

The paper tackles the problem of maintaining temporal consistency in training-free video editing by proposing VALA, a variational alignment module that adaptively selects key frames and compresses latent features into semantic anchors, achieving state-of-the-art performance in inversion fidelity, editing quality, and consistency with improved efficiency.

Recent advances in training-free video editing have enabled lightweight and precise cross-frame generation by leveraging pre-trained text-to-image diffusion models. However, existing methods often rely on heuristic frame selection to maintain temporal consistency during DDIM inversion, which introduces manual bias and reduces the scalability of end-to-end inference. In this paper, we propose~\textbf{VALA} (\textbf{V}ariational \textbf{A}lignment for \textbf{L}atent \textbf{A}nchors), a variational alignment module that adaptively selects key frames and compresses their latent features into semantic anchors for consistent video editing. To learn meaningful assignments, VALA propose a variational framework with a contrastive learning objective. Therefore, it can transform cross-frame latent representations into compressed latent anchors that preserve both content and temporal coherence. Our method can be fully integrated into training-free text-to-image based video editing models. Extensive experiments on real-world video editing benchmarks show that VALA achieves state-of-the-art performance in inversion fidelity, editing quality, and temporal consistency, while offering improved efficiency over prior methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes