CVApr 15

Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias

Zhiyuan Xu, Jiuming Liu, Yuxin Chen, Masayoshi Tomizuka, Chenfeng Xu, Chensheng Peng

Berkeley

arXiv:2604.1390522.4h-index: 10

Predicted impact top 31% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the efficiency and input-view bias problems in image-to-3D generation, offering a practical alternative for 3D generative modeling.

SparseGen introduces a sparse query-based framework for image-to-3D generation that reduces memory and inference time while preserving multi-view fidelity, achieving significant efficiency gains over dense representations.

We present SparseGen, a novel framework for efficient image-to-3D generation, which exhibits low input-view bias while being significantly faster. Unlike traditional approaches that rely on dense volumetric grids, triplanes, or pixel-aligned primitives, we model scenes with a compact sparse set of learned 3D anchor queries and a learned expansion operator that decodes each transformed query into a small local set of 3D Gaussian primitives. Trained under a rectified-flow reconstruction objective without 3D supervision, our model learns to allocate representation capacity where geometry and appearance matter, achieving significant reductions in memory and inference time while preserving multi-view fidelity. We introduce quantitative measures of input-view bias and utilization to show that sparse queries reduce overfitting to conditioning views while being representationally efficient. Our results argue that sparse set-latent expansion is a principled, practical alternative for efficient 3D generative modeling.

View on arXiv PDF

Similar