Layout Anything: One Transformer for Universal Room Layout Estimation
This work addresses the problem of accurate and efficient indoor layout estimation for augmented reality and 3D scene reconstruction, representing an incremental improvement by integrating existing methods with new modules.
The paper tackles indoor layout estimation by proposing Layout Anything, a transformer-based framework that adapts OneFormer for geometric structure prediction, achieving state-of-the-art performance with pixel errors of 5.43% on LSUN and 4.03% on Matterport3D-Layout, and high-speed inference at 114ms.
We present Layout Anything, a transformer-based framework for indoor layout estimation that adapts the OneFormer's universal segmentation architecture to geometric structure prediction. Our approach integrates OneFormer's task-conditioned queries and contrastive learning with two key modules: (1) a layout degeneration strategy that augments training data while preserving Manhattan-world constraints through topology-aware transformations, and (2) differentiable geometric losses that directly enforce planar consistency and sharp boundary predictions during training. By unifying these components in an end-to-end framework, the model eliminates complex post-processing pipelines while achieving high-speed inference at 114ms. Extensive experiments demonstrate state-of-the-art performance across standard benchmarks, with pixel error (PE) of 5.43% and corner error (CE) of 4.02% on the LSUN, PE of 7.04% (CE 5.17%) on the Hedau and PE of 4.03% (CE 3.15%) on the Matterport3D-Layout datasets. The framework's combination of geometric awareness and computational efficiency makes it particularly suitable for augmented reality applications and large-scale 3D scene reconstruction tasks.