CVLGDec 12, 2025

BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models

arXiv:2512.12080v113 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses a key limitation in video generation for applications requiring coherent long sequences, though it is an incremental improvement over existing methods.

The paper tackles the problem of exposure bias in autoregressive video models, which causes quality drift over time, by introducing Backwards Aggregation (BAgger), a self-supervised scheme that improves long-horizon motion stability and visual consistency in tasks like text-to-video generation.

Autoregressive video models are promising for world modeling via next-frame prediction, but they suffer from exposure bias: a mismatch between training on clean contexts and inference on self-generated frames, causing errors to compound and quality to drift over time. We introduce Backwards Aggregation (BAgger), a self-supervised scheme that constructs corrective trajectories from the model's own rollouts, teaching it to recover from its mistakes. Unlike prior approaches that rely on few-step distillation and distribution-matching losses, which can hurt quality and diversity, BAgger trains with standard score or flow matching objectives, avoiding large teachers and long-chain backpropagation through time. We instantiate BAgger on causal diffusion transformers and evaluate on text-to-video, video extension, and multi-prompt generation, observing more stable long-horizon motion and better visual consistency with reduced drift.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes