CVROApr 24

GenAssets: Generating in-the-wild 3D Assets in Latent Space

arXiv:2604.2301079.77 citations
AI Analysis

This work addresses the need for diverse and realistic 3D asset generation for autonomous driving simulation, where existing methods struggle with sparse and occluded in-the-wild data.

The paper proposes a 3D latent diffusion model that generates complete and high-quality 3D assets for traffic participants from in-the-wild LiDAR and camera data, outperforming existing reconstruction and generation methods in terms of geometry and appearance quality.

High-quality 3D assets for traffic participants are critical for multi-sensor simulation, which is essential for the safe end-to-end development of autonomy. Building assets from in-the-wild data is key for diversity and realism, but existing neural-rendering based reconstruction methods are slow and generate assets that render well only from viewpoints close to the original observations, limiting their usefulness in simulation. Recent diffusion-based generative models build complete and diverse assets, but perform poorly on in-the-wild driving scenes, where observed actors are captured under sparse and limited fields of view, and are partially occluded. In this work, we propose a 3D latent diffusion model that learns on in-the-wild LiDAR and camera data captured by a sensor platform and generates high-quality 3D assets with complete geometry and appearance. Key to our method is a "reconstruct-then-generate" approach that first leverages occlusion-aware neural rendering trained over multiple scenes to build a high-quality latent space for objects, and then trains a diffusion model that operates on the latent space. We show our method outperforms existing reconstruction and generation based methods, unlocking diverse and scalable content creation for simulation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes