ASurvey: Spatiotemporal Consistency in Video Generation
It addresses the challenge of maintaining temporal coherence for researchers and practitioners in AI-generated content, but it is incremental as it organizes existing knowledge rather than presenting new methods.
This survey tackles the problem of spatiotemporal consistency in video generation by systematically reviewing recent advances across five key aspects, aiming to fill a gap in literature and inspire future development in the field.
Video generation, by leveraging a dynamic visual generation method, pushes the boundaries of Artificial Intelligence Generated Content (AIGC). Video generation presents unique challenges beyond static image generation, requiring both high-quality individual frames and temporal coherence to maintain consistency across the spatiotemporal sequence. Recent works have aimed at addressing the spatiotemporal consistency issue in video generation, while few literature review has been organized from this perspective. This gap hinders a deeper understanding of the underlying mechanisms for high-quality video generation. In this survey, we systematically review the recent advances in video generation, covering five key aspects: foundation models, information representations, generation schemes, post-processing techniques, and evaluation metrics. We particularly focus on their contributions to maintaining spatiotemporal consistency. Finally, we discuss the future directions and challenges in this field, hoping to inspire further efforts to advance the development of video generation.