CVNov 20, 2025

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

Yang Luo, Xuanlei Zhao, Baijiong Lin, Lingting Zhu, Liyao Tang, Yuqi Liu, Ying-Cong Chen, Shengju Qian, Xin Wang, Yang You

arXiv:2511.16668v121.714 citationsh-index: 11

Originality Synthesis-oriented

AI Analysis

This provides a unified framework for evaluating video reasoning, supporting development of more reliable models, though it is incremental as it builds on existing benchmarking needs.

The authors tackled the lack of systematic evaluation for video generation models by introducing V-ReasonBench, a benchmark assessing reasoning across four dimensions, and found clear differences in performance among six state-of-the-art models.

Recent progress in generative video models, such as Veo-3, has shown surprising zero-shot reasoning abilities, creating a growing need for systematic and reliable evaluation. We introduce V-ReasonBench, a benchmark designed to assess video reasoning across four key dimensions: structured problem-solving, spatial cognition, pattern-based inference, and physical dynamics. The benchmark is built from both synthetic and real-world image sequences and provides a diverse set of answer-verifiable tasks that are reproducible, scalable, and unambiguous. Evaluations of six state-of-the-art video models reveal clear dimension-wise differences, with strong variation in structured, spatial, pattern-based, and physical reasoning. We further compare video models with strong image models, analyze common hallucination behaviors, and study how video duration affects Chain-of-Frames reasoning. Overall, V-ReasonBench offers a unified and reproducible framework for measuring video reasoning and aims to support the development of models with more reliable, human-aligned reasoning skills.

View on arXiv PDF

Similar