CVJun 4

V2V-Bench: A Comprehensive Benchmark for Video-to-Video Generation Evaluation

arXiv:2606.0566557.5Has Code
AI Analysis

For researchers and practitioners in video generation, this benchmark provides a standardized evaluation tool that captures V2V-specific aspects not covered by existing metrics.

The paper introduces V2V-Bench, a comprehensive 11-dimension benchmark for evaluating video-to-video generation, and shows it achieves a Spearman correlation of 0.905 with human judgments on six V2V-specific dimensions.

Video-to-video (V2V) generation is difficult to evaluate because outputs must both follow editing instructions and preserve frame-level correspondence with the source video, which existing T2V and I2V metrics do not capture. We introduce V2V-Bench, a 11-dimension benchmark organized into five categories: temporal alignment, structural fidelity, transformation quality, video quality, and semantic alignment. V2V-Bench pairs diverse source videos with challenging editing tasks and evaluates two commercial models, Grok Imagine and Gemini Veo3, and one open-source model, Open Sora 2. Results show complementary model strengths: Grok performs better on editing fidelity, while Veo3 achieves stronger visual quality. On six V2V-specific dimensions, V2V-Bench reaches a Spearman correlation of 0.905 with human judgments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes