LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network
This work addresses the problem of accurate 3D room layout estimation from single panoramas for applications in computer vision and robotics, representing an incremental improvement over existing methods.
The paper tackled indoor panoramic room layout estimation by introducing a geometry-aware transformer network that uses horizon-depth and room height for omnidirectional awareness and a planar-geometry aware loss function, achieving better performance than state-of-the-art methods on benchmark datasets.
3D room layout estimation by a single panorama using deep neural networks has made great progress. However, previous approaches can not obtain efficient geometry awareness of room layout with the only latitude of boundaries or horizon-depth. We present that using horizon-depth along with room height can obtain omnidirectional-geometry awareness of room layout in both horizontal and vertical directions. In addition, we propose a planar-geometry aware loss function with normals and gradients of normals to supervise the planeness of walls and turning of corners. We propose an efficient network, LGT-Net, for room layout estimation, which contains a novel Transformer architecture called SWG-Transformer to model geometry relations. SWG-Transformer consists of (Shifted) Window Blocks and Global Blocks to combine the local and global geometry relations. Moreover, we design a novel relative position embedding of Transformer to enhance the spatial identification ability for the panorama. Experiments show that the proposed LGT-Net achieves better performance than current state-of-the-arts (SOTA) on benchmark datasets.