CVMar 19

Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos

arXiv:2603.1904876.3h-index: 15
AI Analysis

This addresses the issue of inaccurate evaluation of geometric consistency in generated videos for researchers and developers in video generation, though it is incremental as it builds on existing methods for depth prediction and camera pose estimation.

The paper tackled the problem of 3D spatial geometric inconsistencies in dynamically generated videos by introducing SGC, a metric that quantifies these inconsistencies by measuring divergence among camera poses from distinct local regions, with experiments showing it robustly identifies failures missed by existing metrics.

Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a metric for evaluating 3D \textbf{S}patial \textbf{G}eometric \textbf{C}onsistency in dynamically generated videos. We quantify geometric consistency by measuring the divergence among multiple camera poses estimated from distinct local regions. Our approach first separates static from dynamic regions, then partitions the static background into spatially coherent sub-regions. We predict depth for each pixel, estimate a local camera pose for each subregion, and compute the divergence among these poses to quantify geometric consistency. Experiments on real and generative videos demonstrate that SGC robustly quantifies geometric inconsistencies, effectively identifying critical failures missed by existing metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes