CVOct 13, 2025

MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps

arXiv:2510.11107v11 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the problem of 3D scene motion prediction for computer vision applications, representing an incremental advancement by building on existing generative image models.

The paper tackles the challenge of learning 3D motion priors from real-world videos to predict future scene motion from a single image, proposing a Motion Map representation and training a diffusion model on a large-scale database, with results showing plausible and semantically consistent motion generation.

This paper addresses the challenge of learning semantically and functionally meaningful 3D motion priors from real-world videos, in order to enable prediction of future 3D scene motion from a single input image. We propose a novel pixel-aligned Motion Map (MoMap) representation for 3D scene motion, which can be generated from existing generative image models to facilitate efficient and effective motion prediction. To learn meaningful distributions over motion, we create a large-scale database of MoMaps from over 50,000 real videos and train a diffusion model on these representations. Our motion generation not only synthesizes trajectories in 3D but also suggests a new pipeline for 2D video synthesis: first generate a MoMap, then warp an image accordingly and complete the warped point-based renderings. Experimental results demonstrate that our approach generates plausible and semantically consistent 3D scene motion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes