CVJul 27, 2021

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

Bin Tan, Nan Xue, Song Bai, Tianfu Wu, Gui-Song Xia

arXiv:2107.13108v116.251 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of 3D scene understanding for computer vision applications, representing an incremental improvement over existing methods.

The paper tackles the problem of simultaneously detecting and reconstructing 3D planes from a single image, achieving state-of-the-art performance on ScanNet and NYUv2 datasets.

This paper presents a neural network built upon Transformers, namely PlaneTR, to simultaneously detect and reconstruct planes from a single image. Different from previous methods, PlaneTR jointly leverages the context information and the geometric structures in a sequence-to-sequence way to holistically detect plane instances in one forward pass. Specifically, we represent the geometric structures as line segments and conduct the network with three main components: (i) context and line segments encoders, (ii) a structure-guided plane decoder, (iii) a pixel-wise plane embedding decoder. Given an image and its detected line segments, PlaneTR generates the context and line segment sequences via two specially designed encoders and then feeds them into a Transformers-based decoder to directly predict a sequence of plane instances by simultaneously considering the context and global structure cues. Finally, the pixel-wise embeddings are computed to assign each pixel to one predicted plane instance which is nearest to it in embedding space. Comprehensive experiments demonstrate that PlaneTR achieves a state-of-the-art performance on the ScanNet and NYUv2 datasets.

View on arXiv PDF

Similar