CVMar 24, 2024

latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

arXiv:2403.16292v2125 citationsh-index: 137ECCV
Originality Highly original
AI Analysis

This addresses the challenge of scalable 3D reconstruction for applications like computer vision and graphics, offering a novel hybrid approach that improves upon existing limitations in speed and generalization.

The paper tackles the problem of fast and generalizable 3D reconstruction from limited views by introducing latentSplat, which predicts semantic Gaussians in a 3D latent space and decodes them with a lightweight generative 2D architecture, achieving superior reconstruction quality and generalization compared to previous methods.

We present latentSplat, a method to predict semantic Gaussians in a 3D latent space that can be splatted and decoded by a light-weight generative 2D architecture. Existing methods for generalizable 3D reconstruction either do not scale to large scenes and resolutions, or are limited to interpolation of close input views. latentSplat combines the strengths of regression-based and generative approaches while being trained purely on readily available real video data. The core of our method are variational 3D Gaussians, a representation that efficiently encodes varying uncertainty within a latent space consisting of 3D feature Gaussians. From these Gaussians, specific instances can be sampled and rendered via efficient splatting and a fast, generative decoder. We show that latentSplat outperforms previous works in reconstruction quality and generalization, while being fast and scalable to high-resolution data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes