CVMay 5

First Shape, Then Meaning: Efficient Geometry and Semantics Learning for Indoor Reconstruction

arXiv:2605.0346339.1
AI Analysis

For researchers in indoor 3D reconstruction, FSTM provides a simpler, faster, and more robust alternative to multi-SDF approaches for joint geometry and semantics learning.

FSTM achieves faster and more robust indoor 3D reconstruction by first learning geometry from RGB and geometric cues, then estimating semantics, outperforming multi-SDF methods with 2.3x faster training on Replica and higher recall on ScanNet++.

Neural Surface Reconstruction has become a standard methodology for indoor 3D reconstruction, with Signed Distance Functions (SDFs) proving particularly effective for representing scene geometry. A variety of applications require a detailed understanding of the scene context, driving the need for object-level semantic signals. While recent methods successfully integrate semantic labels, they often inherit the slow training time and limited scalability of multi-SDF learning. In this paper, we introduce FSTM, a unified approach for learning geometry and semantics through a two-step process: a geometry warm-up using RGB inputs and geometric cues, followed by semantic field estimation. By first optimising geometry without semantic supervision, we observe substantial improvements compared to the standard joint optimisation. Rather than relying on specialised modules or complex multi-SDF designs, FSTM shows that a streamlined formulation is sufficient to achieve strong geometric and semantic reconstructions. Experiments on both synthetic and real-world indoor datasets show that our method outperforms multi-SDF approaches. It trains 2.3x faster on Replica, improves robustness to real-world imperfections on ScanNet++, and achieves higher recall by recovering the surfaces of more objects in the scene. The code will be made available at https://remichierchia.github.io/FSTM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes