CVMay 21, 2025

CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation

Xinran Wang, Songyu Xu, Xiangxuan Shan, Yuxuan Zhang, Muxi Diao, Xueyan Duan, Yanhua Huang, Kongming Liang, Zhanyu Ma

arXiv:2505.15145v118.28 citationsh-index: 15Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for better evaluation of AI models in cinematography, which is important for researchers and developers in film and AI, though it is incremental as it focuses on benchmarking rather than new methods.

The authors tackled the problem of evaluating multimodal large language models and video generation models on cinematographic technique understanding and generation by creating CineTechBench, a benchmark with expert-annotated data across seven key dimensions. Their evaluation of 15+ MLLMs and 5+ video generation models revealed limitations in current models, providing insights for future improvements in automated film production and appreciation.

Cinematography is a cornerstone of film production and appreciation, shaping mood, emotion, and narrative through visual elements such as camera movement, shot composition, and lighting. Despite recent progress in multimodal large language models (MLLMs) and video generation models, the capacity of current models to grasp and reproduce cinematographic techniques remains largely uncharted, hindered by the scarcity of expert-annotated data. To bridge this gap, we present CineTechBench, a pioneering benchmark founded on precise, manual annotation by seasoned cinematography experts across key cinematography dimensions. Our benchmark covers seven essential aspects-shot scale, shot angle, composition, camera movement, lighting, color, and focal length-and includes over 600 annotated movie images and 120 movie clips with clear cinematographic techniques. For the understanding task, we design question answer pairs and annotated descriptions to assess MLLMs' ability to interpret and explain cinematographic techniques. For the generation task, we assess advanced video generation models on their capacity to reconstruct cinema-quality camera movements given conditions such as textual prompts or keyframes. We conduct a large-scale evaluation on 15+ MLLMs and 5+ video generation models. Our results offer insights into the limitations of current models and future directions for cinematography understanding and generation in automatically film production and appreciation. The code and benchmark can be accessed at https://github.com/PRIS-CV/CineTechBench.

View on arXiv PDF Code

Similar