CVLGSep 9, 2021

Single Image 3D Object Estimation with Primitive Graph Networks

arXiv:2109.04153v10.00
AI Analysis55

This addresses the challenge of 3D object estimation from single images for applications in visual scene understanding, though it appears incremental as it builds on existing primitive-based and graph network approaches.

The paper tackles the problem of reconstructing 3D objects from a single image by proposing a two-stage graph network that uses primitive-based representations, and it outperforms previous state-of-the-art methods on benchmarks like Pix3D, ModelNet, and NYU Depth V2 with a considerable margin.

Reconstructing 3D object from a single image (RGB or depth) is a fundamental problem in visual scene understanding and yet remains challenging due to its ill-posed nature and complexity in real-world scenes. To address those challenges, we adopt a primitive-based representation for 3D object, and propose a two-stage graph network for primitive-based 3D object estimation, which consists of a sequential proposal module and a graph reasoning module. Given a 2D image, our proposal module first generates a sequence of 3D primitives from input image with local feature attention. Then the graph reasoning module performs joint reasoning on a primitive graph to capture the global shape context for each primitive. Such a framework is capable of taking into account rich geometry and semantic constraints during 3D structure recovery, producing 3D objects with more coherent structure even under challenging viewing conditions. We train the entire graph neural network in a stage-wise strategy and evaluate it on three benchmarks: Pix3D, ModelNet and NYU Depth V2. Extensive experiments show that our approach outperforms the previous state of the arts with a considerable margin.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes