CVAIMay 7

A$^2$RD: Agentic Autoregressive Diffusion for Long Video Consistency

arXiv:2605.0692491.21 citations
Predicted impact top 14% in CV · last 90 daysOriginality Highly original
AI Analysis

This work addresses the problem of semantic drift and narrative collapse in long video generation, offering a significant improvement for video synthesis researchers and practitioners.

A$^2$RD introduces an agentic autoregressive diffusion architecture for long video synthesis that decouples creative generation from consistency enforcement, achieving up to 30% improvement in consistency and 20% in narrative coherence over state-of-the-art baselines on benchmarks spanning one to ten minutes.

Synthesizing consistent and coherent long video remains a fundamental challenge. Existing methods suffer from semantic drift and narrative collapse over long horizons. We present A$^2$RD, an Agentic Auto-Regressive Diffusion architecture that decouples creative synthesis from consistency enforcement. A$^2$RD formulates long video synthesis as a closed-loop process that synthesizes and self-improves video segment-by-segment through a Retrieve--Synthesize--Refine--Update cycle. It comprises three core components: (i) Multimodal Video Memory that tracks video progression across modalities; (ii) Adaptive Segment Generation that switches among generation modes for natural progression and visual consistency; and (iii) Hierarchical Test-Time Self-Improvement that self-improves each segment at frame and video levels to prevent error propagation. We further introduce LVBench-C, a challenging benchmark with non-linear entity and environment transitions to stress-test long-horizon consistency. Across public and LVBench-C benchmarks spanning one- to ten-minute videos, A$^2$RD outperforms state-of-the-art baselines by up to 30% in consistency and 20% in narrative coherence. Human evaluations corroborate these gains while also highlighting notable improvements in motion and transition smoothness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes