CVAILGDec 18, 2021

3D Instance Segmentation of MVS Buildings

arXiv:2112.09902v230 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of precise 3D building instance segmentation in urban scenes for applications like mapping and urban planning, though it is incremental as it builds on existing segmentation techniques.

The paper tackles 3D instance segmentation of buildings from multi-view stereo data, achieving effective detection and segmentation of attached building instances through a framework that integrates 2D instance segmentation, mask clustering, and 3D optimization, with quantitative evaluations showing advantages over orthophoto-based methods.

We present a novel 3D instance segmentation framework for Multi-View Stereo (MVS) buildings in urban scenes. Unlike existing works focusing on semantic segmentation of urban scenes, the emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model. Multi-view RGB images are first enhanced to RGBH images by adding a heightmap and are segmented to obtain all roof instances using a fine-tuned 2D instance segmentation neural network. Instance masks from different multi-view images are then clustered into global masks. Our mask clustering accounts for spatial occlusion and overlapping, which can eliminate segmentation ambiguities among multi-view images. Based on these global masks, 3D roof instances are segmented out by mask back-projections and extended to the entire building instances through a Markov random field optimization. A new dataset that contains instance-level annotation for both 3D urban scenes (roofs and buildings) and drone images (roofs) is provided. To the best of our knowledge, it is the first outdoor dataset dedicated to 3D instance segmentation with much more annotations of attached 3D buildings than existing datasets. Quantitative evaluations and ablation studies have shown the effectiveness of all major steps and the advantages of our multi-view framework over the orthophoto-based method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes