CVJul 24, 2025

MVG4D: Image Matrix-Based Multi-View and Motion Generation for 4D Content Creation from a Single Image

arXiv:2507.18371v2h-index: 1
Originality Incremental advance
AI Analysis

It addresses a key challenge in digital content creation for AR/VR applications by enabling efficient 4D generation from minimal inputs, though it builds incrementally on existing 4D Gaussian Splatting methods.

The paper tackles the problem of generating high-fidelity and temporally consistent dynamic 4D content from a single image, achieving state-of-the-art performance on metrics like CLIP-I, PSNR, and FVD while reducing flickering artifacts and improving time efficiency.

Advances in generative modeling have significantly enhanced digital content creation, extending from 2D images to complex 3D and 4D scenes. Despite substantial progress, producing high-fidelity and temporally consistent dynamic 4D content remains a challenge. In this paper, we propose MVG4D, a novel framework that generates dynamic 4D content from a single still image by combining multi-view synthesis with 4D Gaussian Splatting (4D GS). At its core, MVG4D employs an image matrix module that synthesizes temporally coherent and spatially diverse multi-view images, providing rich supervisory signals for downstream 3D and 4D reconstruction. These multi-view images are used to optimize a 3D Gaussian point cloud, which is further extended into the temporal domain via a lightweight deformation network. Our method effectively enhances temporal consistency, geometric fidelity, and visual realism, addressing key challenges in motion discontinuity and background degradation that affect prior 4D GS-based methods. Extensive experiments on the Objaverse dataset demonstrate that MVG4D outperforms state-of-the-art baselines in CLIP-I, PSNR, FVD, and time efficiency. Notably, it reduces flickering artifacts and sharpens structural details across views and time, enabling more immersive AR/VR experiences. MVG4D sets a new direction for efficient and controllable 4D generation from minimal inputs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes