CVJul 19, 2024

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

arXiv:2407.14505v2138 citationsh-index: 14
Originality Synthesis-oriented
AI Analysis

This addresses a critical gap in benchmarking for text-to-video models, which is important for researchers and developers in AI video generation, though it is incremental as it builds on existing evaluation frameworks.

The paper tackles the lack of evaluation for compositional abilities in text-to-video generation by introducing T2V-CompBench, a benchmark with 1400 prompts across seven categories, and finds that current models struggle significantly with these tasks.

Text-to-video (T2V) generative models have advanced significantly, yet their ability to compose different objects, attributes, actions, and motions into a video remains unexplored. Previous text-to-video benchmarks also neglect this important ability for evaluation. In this work, we conduct the first systematic study on compositional text-to-video generation. We propose T2V-CompBench, the first benchmark tailored for compositional text-to-video generation. T2V-CompBench encompasses diverse aspects of compositionality, including consistent attribute binding, dynamic attribute binding, spatial relationships, motion binding, action binding, object interactions, and generative numeracy. We further carefully design evaluation metrics of multimodal large language model (MLLM)-based, detection-based, and tracking-based metrics, which can better reflect the compositional text-to-video generation quality of seven proposed categories with 1400 text prompts. The effectiveness of the proposed metrics is verified by correlation with human evaluations. We also benchmark various text-to-video generative models and conduct in-depth analysis across different models and various compositional categories. We find that compositional text-to-video generation is highly challenging for current models, and we hope our attempt could shed light on future research in this direction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes