CVJul 31, 2025

World Consistency Score: A Unified Metric for Video Generation Quality

arXiv:2508.00144v13 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This addresses the need for better evaluation metrics in video generation, particularly for assessing temporal and physical coherence, though it is incremental as it builds on existing tools and benchmarks.

The paper tackles the problem of evaluating video generation quality by introducing World Consistency Score (WCS), a unified metric that measures internal world consistency through four interpretable sub-components, and it aligns with human judgments.

We introduce World Consistency Score (WCS), a novel unified evaluation metric for generative video models that emphasizes internal world consistency of the generated videos. WCS integrates four interpretable sub-components - object permanence, relation stability, causal compliance, and flicker penalty - each measuring a distinct aspect of temporal and physical coherence in a video. These submetrics are combined via a learned weighted formula to produce a single consistency score that aligns with human judgments. We detail the motivation for WCS in the context of existing video evaluation metrics, formalize each submetric and how it is computed with open-source tools (trackers, action recognizers, CLIP embeddings, optical flow), and describe how the weights of the WCS combination are trained using human preference data. We also outline an experimental validation blueprint: using benchmarks like VBench-2.0, EvalCrafter, and LOVE to test WCS's correlation with human evaluations, performing sensitivity analyses, and comparing WCS against established metrics (FVD, CLIPScore, VBench, FVMD). The proposed WCS offers a comprehensive and interpretable framework for evaluating video generation models on their ability to maintain a coherent "world" over time, addressing gaps left by prior metrics focused only on visual fidelity or prompt alignment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes