CVLGJan 26

Self-Refining Video Sampling

arXiv:2601.18577v14 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses the challenge of generating physically realistic videos for AI and creative applications, representing an incremental improvement over existing methods.

The paper tackles the problem of poor physical realism in video generation by introducing a self-refining sampling method that iteratively refines outputs at inference time without extra training, achieving over 70% human preference for motion coherence and physics alignment.

Modern video generators still struggle with complex physical dynamics, often falling short of physical realism. Existing approaches address this using external verifiers or additional training on augmented data, which is computationally expensive and still limited in capturing fine-grained motion. In this work, we present self-refining video sampling, a simple method that uses a pre-trained video generator trained on large-scale datasets as its own self-refiner. By interpreting the generator as a denoising autoencoder, we enable iterative inner-loop refinement at inference time without any external verifier or additional training. We further introduce an uncertainty-aware refinement strategy that selectively refines regions based on self-consistency, which prevents artifacts caused by over-refinement. Experiments on state-of-the-art video generators demonstrate significant improvements in motion coherence and physics alignment, achieving over 70\% human preference compared to the default sampler and guidance-based sampler.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes