Artifact-Aware Evaluation for High-Quality Video Generation
This addresses the need for detailed artifact evaluation in video generation, which is incremental as it builds on existing coarse evaluation methods.
The paper tackles the problem of evaluating generated videos by introducing a comprehensive protocol focusing on appearance, motion, and camera artifacts, with a taxonomy of 10 categories, and shows that their DVAR framework significantly improves artifact detection accuracy.
With the rapid advancement of video generation techniques, evaluating and auditing generated videos has become increasingly crucial. Existing approaches typically offer coarse video quality scores, lacking detailed localization and categorization of specific artifacts. In this work, we introduce a comprehensive evaluation protocol focusing on three key aspects affecting human perception: Appearance, Motion, and Camera. We define these axes through a taxonomy of 10 prevalent artifact categories reflecting common generative failures observed in video generation. To enable robust artifact detection and categorization, we introduce GenVID, a large-scale dataset of 80k videos generated by various state-of-the-art video generation models, each carefully annotated for the defined artifact categories. Leveraging GenVID, we develop DVAR, a Dense Video Artifact Recognition framework for fine-grained identification and classification of generative artifacts. Extensive experiments show that our approach significantly improves artifact detection accuracy and enables effective filtering of low-quality content.