CVDec 27, 2025

SAM 3D for 3D Object Reconstruction from Remote Sensing Images

arXiv:2512.22452v1h-index: 3
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of scalable urban modeling for remote sensing applications, but it is incremental as it applies an existing foundation model to a new domain.

This paper evaluated SAM 3D, a general-purpose image-to-3D foundation model, for monocular 3D building reconstruction from remote sensing images, finding it produced more coherent roof geometry and sharper boundaries compared to TRELLIS on the NYC Urban Dataset.

Monocular 3D building reconstruction from remote sensing imagery is essential for scalable urban modeling, yet existing methods often require task-specific architectures and intensive supervision. This paper presents the first systematic evaluation of SAM 3D, a general-purpose image-to-3D foundation model, for monocular remote sensing building reconstruction. We benchmark SAM 3D against TRELLIS on samples from the NYC Urban Dataset, employing Frechet Inception Distance (FID) and CLIP-based Maximum Mean Discrepancy (CMMD) as evaluation metrics. Experimental results demonstrate that SAM 3D produces more coherent roof geometry and sharper boundaries compared to TRELLIS. We further extend SAM 3D to urban scene reconstruction through a segment-reconstruct-compose pipeline, demonstrating its potential for urban scene modeling. We also analyze practical limitations and discuss future research directions. These findings provide practical guidance for deploying foundation models in urban 3D reconstruction and motivate future integration of scene-level structural priors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes