CVAILGApr 18, 2024

6Img-to-3D: Few-Image Large-Scale Outdoor Driving Scene Reconstruction

arXiv:2404.12378v25 citationsh-index: 4Has Code2025 IEEE Intelligent Vehicles Symposium (IV)
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient and scalable 3D reconstruction for autonomous driving applications, though it appears incremental by building on existing transformer and triplane methods.

The paper tackles the problem of reconstructing large-scale outdoor driving scenes from only six images without detailed pose information, achieving 360-degree scene reconstruction in 395 ms.

Current 3D reconstruction techniques struggle to infer unbounded scenes from a few images faithfully. Specifically, existing methods have high computational demands, require detailed pose information, and cannot reconstruct occluded regions reliably. We introduce 6Img-to-3D, an efficient, scalable transformer-based encoder-renderer method for single-shot image to 3D reconstruction. Our method outputs a 3D-consistent parameterized triplane from only six outward-facing input images for large-scale, unbounded outdoor driving scenarios. We take a step towards resolving existing shortcomings by combining contracted custom cross- and self-attention mechanisms for triplane parameterization, differentiable volume rendering, scene contraction, and image feature projection. We showcase that six surround-view vehicle images from a single timestamp without global pose information are enough to reconstruct 360$^{\circ}$ scenes during inference time, taking 395 ms. Our method allows, for example, rendering third-person images and birds-eye views. Our code is available at https://github.com/continental/6Img-to-3D, and more examples can be found at our website here https://6Img-to-3D.GitHub.io/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes