CVMar 15, 2024

Robust Shape Fitting for 3D Scene Abstraction

arXiv:2403.10452v16 citationsh-index: 28IEEE Trans Pattern Anal Mach Intell
Originality Incremental advance
AI Analysis

This work addresses the challenge of high-level scene understanding for robotics or AR/VR applications by providing robust primitive fitting, though it appears incremental as it builds on existing methods with improvements in robustness and efficiency.

The paper tackles the problem of abstracting complex real-world 3D scenes into simple parametric models like cuboids, achieving successful abstraction on cluttered scenes from the NYU Depth v2 dataset without requiring labor-intensive cuboid annotations for training.

Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes