CVLGApr 30, 2025

Direct Motion Models for Assessing Generated Videos

arXiv:2505.00209v19 citationsh-index: 31Has CodeICML
Originality Incremental advance
AI Analysis

This addresses a key limitation in video generation evaluation for researchers and practitioners, providing a more accurate and interpretable tool, though it is incremental as it builds on prior metrics.

The paper tackles the problem of poor motion quality in generated videos, which is not well captured by existing metrics like FVD, by developing a novel metric based on auto-encoding point tracks that better measures plausible object interactions and motion, showing it is more sensitive to temporal distortions and predicts human evaluations of realism better than alternatives.

A current limitation of video generative video models is that they generate plausible looking frames, but poor motion -- an issue that is not well captured by FVD and other popular methods for evaluating generated videos. Here we go beyond FVD by developing a metric which better measures plausible object interactions and motion. Our novel approach is based on auto-encoding point tracks and yields motion features that can be used to not only compare distributions of videos (as few as one generated and one ground truth, or as many as two datasets), but also for evaluating motion of single videos. We show that using point tracks instead of pixel reconstruction or action recognition features results in a metric which is markedly more sensitive to temporal distortions in synthetic data, and can predict human evaluations of temporal consistency and realism in generated videos obtained from open-source models better than a wide range of alternatives. We also show that by using a point track representation, we can spatiotemporally localize generative video inconsistencies, providing extra interpretability of generated video errors relative to prior work. An overview of the results and link to the code can be found on the project page: http://trajan-paper.github.io.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes