CVSep 10, 2025

GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts

Jenna Kang, Maria Silva, Patsorn Sangkloy, Kenneth Chen, Niall Williams, Qi Sun

arXiv:2509.08818v16.21 citationsh-index: 10

Originality Synthesis-oriented

AI Analysis

This provides a dataset for researchers and developers to benchmark and enhance generative video models, addressing a gap in existing resources focused on images.

The paper tackles the problem of unpredictable artifacts in text-to-video generation by introducing GeneVA, a large-scale dataset with human annotations for spatio-temporal artifacts, enabling systematic benchmarking and quality improvement.

Recent advances in probabilistic generative models have extended capabilities from static image synthesis to text-driven video generation. However, the inherent randomness of their generation process can lead to unpredictable artifacts, such as impossible physics and temporal inconsistency. Progress in addressing these challenges requires systematic benchmarks, yet existing datasets primarily focus on generative images due to the unique spatio-temporal complexities of videos. To bridge this gap, we introduce GeneVA, a large-scale artifact dataset with rich human annotations that focuses on spatio-temporal artifacts in videos generated from natural text prompts. We hope GeneVA can enable and assist critical applications, such as benchmarking model performance and improving generative video quality.

View on arXiv PDF

Similar