CVSep 29, 2025

RapidMV: Leveraging Spatio-Angular Representations for Efficient and Consistent Text-to-Multi-View Synthesis

arXiv:2509.24410v1h-index: 10
Originality Incremental advance
AI Analysis

This addresses the need for efficient 3D asset generation from text, though it appears incremental as it builds on existing text-to-multi-view synthesis approaches.

The paper tackles the problem of generating synthetic multi-view images from text prompts, introducing RapidMV which produces 32 multi-view images in about 5 seconds while improving consistency and latency compared to existing methods.

Generating synthetic multi-view images from a text prompt is an essential bridge to generating synthetic 3D assets. In this work, we introduce RapidMV, a novel text-to-multi-view generative model that can produce 32 multi-view synthetic images in just around 5 seconds. In essence, we propose a novel spatio-angular latent space, encoding both the spatial appearance and angular viewpoint deviations into a single latent for improved efficiency and multi-view consistency. We achieve effective training of RapidMV by strategically decomposing our training process into multiple steps. We demonstrate that RapidMV outperforms existing methods in terms of consistency and latency, with competitive quality and text-image alignment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes