CVMar 26, 2025

MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation

arXiv:2503.20519v39 citationsh-index: 10CVPR
Originality Highly original
AI Analysis

This work addresses 3D generation for computer graphics and AI applications, offering a novel method to overcome specific bottlenecks in the field.

The paper tackled the challenges of applying auto-regressive transformers to 3D generation, such as unordered data and compression loss, by introducing MAR-3D, which achieved superior performance and enhanced scaling capabilities compared to existing methods.

Recent advances in auto-regressive transformers have revolutionized generative modeling across different domains, from language processing to visual generation, demonstrating remarkable capabilities. However, applying these advances to 3D generation presents three key challenges: the unordered nature of 3D data conflicts with sequential next-token prediction paradigm, conventional vector quantization approaches incur substantial compression loss when applied to 3D meshes, and the lack of efficient scaling strategies for higher resolution latent prediction. To address these challenges, we introduce MAR-3D, which integrates a pyramid variational autoencoder with a cascaded masked auto-regressive transformer (Cascaded MAR) for progressive latent upscaling in the continuous space. Our architecture employs random masking during training and auto-regressive denoising in random order during inference, naturally accommodating the unordered property of 3D latent tokens. Additionally, we propose a cascaded training strategy with condition augmentation that enables efficiently up-scale the latent token resolution with fast convergence. Extensive experiments demonstrate that MAR-3D not only achieves superior performance and generalization capabilities compared to existing methods but also exhibits enhanced scaling capabilities compared to joint distribution modeling approaches (e.g., diffusion transformers).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes