CVOct 7, 2025

ShapeGen4D: Towards High Quality 4D Shape Generation from Videos

arXiv:2510.06208v116 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the problem of generating high-quality 4D shapes from videos for applications in computer vision and graphics, representing an incremental advancement with novel components.

The paper tackled video-conditioned 4D shape generation by introducing a framework that synthesizes dynamic 3D representations from videos, improving robustness and perceptual fidelity across diverse in-the-wild videos compared to baselines.

Video-conditioned 4D shape generation aims to recover time-varying 3D geometry and view-consistent appearance directly from an input video. In this work, we introduce a native video-to-4D shape generation framework that synthesizes a single dynamic 3D representation end-to-end from the video. Our framework introduces three key components based on large-scale pre-trained 3D models: (i) a temporal attention that conditions generation on all frames while producing a time-indexed dynamic representation; (ii) a time-aware point sampling and 4D latent anchoring that promote temporally consistent geometry and texture; and (iii) noise sharing across frames to enhance temporal stability. Our method accurately captures non-rigid motion, volume changes, and even topological transitions without per-frame optimization. Across diverse in-the-wild videos, our method improves robustness and perceptual fidelity and reduces failure modes compared with the baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes