88.9CVApr 13Code
GS4City: Hierarchical Semantic Gaussian Splatting via City-Model PriorsQilin Zhang, Jinyu Zhu, Olaf Wysocki et al.
Recent semantic 3D Gaussian Splatting (3DGS) methods primarily rely on 2D foundation models, often yielding ambiguous boundaries and limited support for structured urban semantics. While city models such as CityGML encode hierarchically organized semantics together with building geometry, these labels cannot be directly mapped to Gaussian primitives. We present GS4City, a hierarchical semantic Gaussian Splatting method that incorporates city-model priors for urban scene understanding. GS4City derives reliable image-aligned masks from Level of Detail (LoD) 3 CityGML models via two-pass raycasting, explicitly using parent-child relations to validate and recover fine-grained facade elements. It then fuses these geometry-grounded masks with foundation-model predictions to establish scene-consistent instance correspondences, and learns a compact identity encoding for each Gaussian under joint 2D identity supervision and 3D spatial regularization. Experiments on the TUM2TWIN and Gold Coast datasets show that GS4City effectively incorporates structured building semantics into Gaussian scene representations, outperforming existing 2D-driven semantic 3DGS baselines, including LangSplat and Gaga, by up to 15.8 IoU points in coarse building segmentation and 14.2 mIoU points in fine-grained semantic segmentation. By bridging structured city models and photorealistic Gaussian scene representations, GS4City enables semantically queryable and structure-aware urban reconstruction. Code is available at https://github.com/Jinyzzz/GS4City.
CVSep 26, 2023
3D Density-Gradient based Edge Detection on Neural Radiance Fields (NeRFs) for Geometric ReconstructionMiriam Jäger, Boris Jutzi
Generating geometric 3D reconstructions from Neural Radiance Fields (NeRFs) is of great interest. However, accurate and complete reconstructions based on the density values are challenging. The network output depends on input data, NeRF network configuration and hyperparameter. As a result, the direct usage of density values, e.g. via filtering with global density thresholds, usually requires empirical investigations. Under the assumption that the density increases from non-object to object area, the utilization of density gradients from relative values is evident. As the density represents a position-dependent parameter it can be handled anisotropically, therefore processing of the voxelized 3D density field is justified. In this regard, we address geometric 3D reconstructions based on density gradients, whereas the gradients result from 3D edge detection filters of the first and second derivatives, namely Sobel, Canny and Laplacian of Gaussian. The gradients rely on relative neighboring density values in all directions, thus are independent from absolute magnitudes. Consequently, gradient filters are able to extract edges along a wide density range, almost independent from assumptions and empirical investigations. Our approach demonstrates the capability to achieve geometric 3D reconstructions with high geometric accuracy on object surfaces and remarkable object completeness. Notably, Canny filter effectively eliminates gaps, delivers a uniform point density, and strikes a favorable balance between correctness and completeness across the scenes.
CVApr 20, 2023
A Comparative Neural Radiance Field (NeRF) 3D Analysis of Camera Poses from HoloLens Trajectories and Structure from MotionMiriam Jäger, Patrick Hübner, Dennis Haitz et al.
Neural Radiance Fields (NeRFs) are trained using a set of camera poses and associated images as input to estimate density and color values for each position. The position-dependent density learning is of particular interest for photogrammetry, enabling 3D reconstruction by querying and filtering the NeRF coordinate system based on the object density. While traditional methods like Structure from Motion are commonly used for camera pose calculation in pre-processing for NeRFs, the HoloLens offers an interesting interface for extracting the required input data directly. We present a workflow for high-resolution 3D reconstructions almost directly from HoloLens data using NeRFs. Thereby, different investigations are considered: Internal camera poses from the HoloLens trajectory via a server application, and external camera poses from Structure from Motion, both with an enhanced variant applied through pose refinement. Results show that the internal camera poses lead to NeRF convergence with a PSNR of 25\,dB with a simple rotation around the x-axis and enable a 3D reconstruction. Pose refinement enables comparable quality compared to external camera poses, resulting in improved training process with a PSNR of 27\,dB and a better 3D reconstruction. Overall, NeRF reconstructions outperform the conventional photogrammetric dense reconstruction using Multi-View Stereo in terms of completeness and level of detail.
CVApr 27, 2023
Combining HoloLens with Instant-NeRFs: Advanced Real-Time 3D Mobile MappingDennis Haitz, Boris Jutzi, Markus Ulrich et al.
This work represents a large step into modern ways of fast 3D reconstruction based on RGB camera images. Utilizing a Microsoft HoloLens 2 as a multisensor platform that includes an RGB camera and an inertial measurement unit for SLAM-based camera-pose determination, we train a Neural Radiance Field (NeRF) as a neural scene representation in real-time with the acquired data from the HoloLens. The HoloLens is connected via Wifi to a high-performance PC that is responsible for the training and 3D reconstruction. After the data stream ends, the training is stopped and the 3D reconstruction is initiated, which extracts a point cloud of the scene. With our specialized inference algorithm, five million scene points can be extracted within 1 second. In addition, the point cloud also includes radiometry per point. Our method of 3D reconstruction outperforms grid point sampling with NeRFs by multiple orders of magnitude and can be regarded as a complete real-time 3D reconstruction method in a mobile mapping setup.
CVMay 14, 2022
Corrosion Detection for Industrial Objects: From Multi-Sensor System to 5D Feature SpaceDennis Haitz, Boris Jutzi, Patrick Huebner et al.
Corrosion is a form of damage that often appears on the surface of metal-made objects used in industrial applications. Those damages can be critical depending on the purpose of the used object. Optical-based testing systems provide a form of non-contact data acquisition, where the acquired data can then be used to analyse the surface of an object. In the field of industrial image processing, this is called surface inspection. We provide a testing setup consisting of a rotary table which rotates the object by 360 degrees, as well as industrial RGB cameras and laser triangulation sensors for the acquisition of 2D and 3D data as our multi-sensor system. These sensors acquire data while the object to be tested takes a full rotation. Further on, data augmentation is applied to prepare new data or enhance already acquired data. In order to evaluate the impact of a laser triangulation sensor for corrosion detection, one challenge is to at first fuse the data of both domains. After the data fusion process, 5 different channels can be utilized to create a 5D feature space. Besides the red, green and blue channels of the image (1-3), additional range data from the laser triangulation sensor is incorporated (4). As a fifth channel, said sensor provides additional intensity data (5). With a multi-channel image classification, a 5D feature space will lead to slightly superior results opposed to a 3D feature space, composed of only the RGB channels of the image.
CVApr 27, 2025Code
OPAL: Visibility-aware LiDAR-to-OpenStreetMap Place Recognition via Adaptive Radial FusionShuhao Kang, Martin Y. Liao, Yan Xia et al.
LiDAR place recognition is a critical capability for autonomous navigation and cross-modal localization in large-scale outdoor environments. Existing approaches predominantly depend on pre-built 3D dense maps or aerial imagery, which impose significant storage overhead and lack real-time adaptability. In this paper, we propose OPAL, a novel framework for LiDAR place recognition that leverages OpenStreetMap (OSM) as a lightweight and up-to-date prior. Our key innovation lies in bridging the domain disparity between sparse LiDAR scans and structured OSM data through two carefully designed components. First, a cross-modal visibility mask that identifies observable regions from both modalities to guide feature alignment. Second, an adaptive radial fusion module that dynamically consolidates radial features into discriminative global descriptors. Extensive experiments on KITTI and KITTI-360 datasets demonstrate OPAL's superiority, achieving 15.98% higher recall at 1m threshold for top-1 retrieved matches, along with 12x faster inference speed compared to the state-of-the-art approach. Code and data are publicly available at: https://github.com/kang-1-2-3/OPAL.
16.4CVMay 11
BathyFacto: Refraction-Aware Two-Media Neural Radiance Fields for BathymetryMarkus Brezovsky, Anatol Günthner, Frederik Schulte et al.
Through-water photogrammetry based on UAV imagery enables shallow-water bathymetry, but refraction at the air-water interface violates the straight-ray assumption of Structure-from-Motion and causes systematic depth bias. We present BathyFacto, a refraction-aware two-media extension of Nerfacto integrated into Nerfstudio that targets metrically precise underwater point clouds. BathyFacto uses a shared hash-grid-based density field with a medium-conditioned color head that receives a one-bit medium flag (air or water) and traces each camera ray as two segments: a straight segment in air up to a planar water surface and a refracted segment in water computed via Snell's law with known refractive indices. To allocate samples efficiently across the air-water boundary, we employ a single proposal-network sampler that operates on a virtual straight ray spanning both media, combined with a kinked density wrapper that transparently corrects water-segment positions along the refracted direction before density evaluation. A data adaptation pipeline converts photogrammetric reconstructions to a Nerfstudio-compatible format, estimates the water plane from boundary markers, and provides per-pixel medium masks to gate refraction. We also extend the point cloud export with refraction-corrected backprojection and reversible coordinate transforms to world and global frames. On a simulated two-media scene with known ground truth, BathyFacto with refraction achieves a Cloud-to-Mesh mean distance of 0.06 m and 87 % completeness, compared to 0.52 m / 29 % for the Nerfacto baseline and 0.36 m / 21% for conventional MVS without refraction correction.
CVAug 21, 2025Code
CM2LoD3: Reconstructing LoD3 Building Models Using Semantic Conflict MapsFranz Hanke, Antonia Bieringer, Olaf Wysocki et al.
Detailed 3D building models are crucial for urban planning, digital twins, and disaster management applications. While Level of Detail 1 (LoD)1 and LoD2 building models are widely available, they lack detailed facade elements essential for advanced urban analysis. In contrast, LoD3 models address this limitation by incorporating facade elements such as windows, doors, and underpasses. However, their generation has traditionally required manual modeling, making large-scale adoption challenging. In this contribution, CM2LoD3, we present a novel method for reconstructing LoD3 building models leveraging Conflict Maps (CMs) obtained from ray-to-model-prior analysis. Unlike previous works, we concentrate on semantically segmenting real-world CMs with synthetically generated CMs from our developed Semantic Conflict Map Generator (SCMG). We also observe that additional segmentation of textured models can be fused with CMs using confidence scores to further increase segmentation performance and thus increase 3D reconstruction accuracy. Experimental results demonstrate the effectiveness of our CM2LoD3 method in segmenting and reconstructing building openings, with the 61% performance with uncertainty-aware fusion of segmented building textures. This research contributes to the advancement of automated LoD3 model reconstruction, paving the way for scalable and efficient 3D city modeling. Our project is available: https://github.com/InFraHank/CM2LoD3
CVAug 10, 2025Code
GS4Buildings: Prior-Guided Gaussian Splatting for 3D Building ReconstructionQilin Zhang, Olaf Wysocki, Boris Jutzi
Recent advances in Gaussian Splatting (GS) have demonstrated its effectiveness in photo-realistic rendering and 3D reconstruction. Among these, 2D Gaussian Splatting (2DGS) is particularly suitable for surface reconstruction due to its flattened Gaussian representation and integrated normal regularization. However, its performance often degrades in large-scale and complex urban scenes with frequent occlusions, leading to incomplete building reconstructions. We propose GS4Buildings, a novel prior-guided Gaussian Splatting method leveraging the ubiquity of semantic 3D building models for robust and scalable building surface reconstruction. Instead of relying on traditional Structure-from-Motion (SfM) pipelines, GS4Buildings initializes Gaussians directly from low-level Level of Detail (LoD)2 semantic 3D building models. Moreover, we generate prior depth and normal maps from the planar building geometry and incorporate them into the optimization process, providing strong geometric guidance for surface consistency and structural accuracy. We also introduce an optional building-focused mode that limits reconstruction to building regions, achieving a 71.8% reduction in Gaussian primitives and enabling a more efficient and compact representation. Experiments on urban datasets demonstrate that GS4Buildings improves reconstruction completeness by 20.5% and geometric accuracy by 32.8%. These results highlight the potential of semantic building model integration to advance GS-based reconstruction toward real-world urban applications such as smart cities and digital twins. Our project is available: https://github.com/zqlin0521/GS4Buildings.
CVMay 23, 2025Code
To Glue or Not to Glue? Classical vs Learned Image Matching for Mobile Mapping Cameras to Textured Semantic 3D Building ModelsSimone Gaisbauer, Prabin Gyawali, Qilin Zhang et al.
Feature matching is a necessary step for many computer vision and photogrammetry applications such as image registration, structure-from-motion, and visual localization. Classical handcrafted methods such as SIFT feature detection and description combined with nearest neighbour matching and RANSAC outlier removal have been state-of-the-art for mobile mapping cameras. With recent advances in deep learning, learnable methods have been introduced and proven to have better robustness and performance under complex conditions. Despite their growing adoption, a comprehensive comparison between classical and learnable feature matching methods for the specific task of semantic 3D building camera-to-model matching is still missing. This submission systematically evaluates the effectiveness of different feature-matching techniques in visual localization using textured CityGML LoD2 models. We use standard benchmark datasets (HPatches, MegaDepth-1500) and custom datasets consisting of facade textures and corresponding camera images (terrestrial and drone). For the latter, we evaluate the achievable accuracy of the absolute pose estimated using a Perspective-n-Point (PnP) algorithm, with geometric ground truth derived from geo-referenced trajectory data. The results indicate that the learnable feature matching methods vastly outperform traditional approaches regarding accuracy and robustness on our challenging custom datasets with zero to 12 RANSAC-inliers and zero to 0.16 area under the curve. We believe that this work will foster the development of model-based visual localization methods. Link to the code: https://github.com/simBauer/To\_Glue\_or\_not\_to\_Glue
CVMar 6
EntON: Eigenentropy-Optimized Neighborhood Densification in 3D Gaussian SplattingMiriam Jäger, Boris Jutzi
We present a novel Eigenentropy-optimized neighboorhood densification strategy EntON in 3D Gaussian Splatting (3DGS) for geometrically accurate and high-quality rendered 3D reconstruction. While standard 3DGS produces Gaussians whose centers and surfaces are poorly aligned with the underlying object geometry, surface-focused reconstruction methods frequently sacrifice photometric accuracy. In contrast to the conventional densification strategy, which relies on the magnitude of the view-space position gradient, our approach introduces a geometry-aware strategy to guide adaptive splitting and pruning. Specifically, we compute the 3D shape feature Eigenentropy from the eigenvalues of the covariance matrix in the k-nearest neighborhood of each Gaussian center, which quantifies the local structural order. These Eigenentropy values are integrated into an alternating optimization framework: During the optimization process, the algorithm alternates between (i) standard gradient-based densification, which refines regions via view-space gradients, and (ii) Eigenentropy-aware densification, which preferentially densifies Gaussians in low-Eigenentropy (ordered, flat) neighborhoods to better capture fine geometric details on the object surface, and prunes those in high-Eigenentropy (disordered, spherical) regions. We provide quantitative and qualitative evaluations on two benchmark datasets: small-scale DTU dataset and large-scale TUM2TWIN dataset, covering man-made objects and urban scenes. Experiments demonstrate that our Eigenentropy-aware alternating densification strategy improves geometric accuracy by up to 33% and rendering quality by up to 7%, while reducing the number of Gaussians by up to 50% and training time by up to 23%. Overall, EnTON achieves a favorable balance between geometric accuracy, rendering quality and efficiency by avoiding unnecessary scene expansion.
CVDec 22, 2023
Density Uncertainty Quantification with NeRF-Ensembles: Impact of Data and Scene ConstraintsMiriam Jäger, Steven Landgraf, Boris Jutzi
In the fields of computer graphics, computer vision and photogrammetry, Neural Radiance Fields (NeRFs) are a major topic driving current research and development. However, the quality of NeRF-generated 3D scene reconstructions and subsequent surface reconstructions, heavily relies on the network output, particularly the density. Regarding this critical aspect, we propose to utilize NeRF-Ensembles that provide a density uncertainty estimate alongside the mean density. We demonstrate that data constraints such as low-quality images and poses lead to a degradation of the training process, increased density uncertainty and decreased predicted density. Even with high-quality input data, the density uncertainty varies based on scene constraints such as acquisition constellations, occlusions and material properties. NeRF-Ensembles not only provide a tool for quantifying the uncertainty but exhibit two promising advantages: Enhanced robustness and artifact removal. Through the utilization of NeRF-Ensembles instead of single NeRFs, small outliers are removed, yielding a smoother output with improved completeness of structures. Furthermore, applying percentile-based thresholds on density uncertainty outliers proves to be effective for the removal of large (foggy) artifacts in post-processing. We conduct our methodology on 3 different datasets: (i) synthetic benchmark dataset, (ii) real benchmark dataset, (iii) real data under realistic recording conditions and sensors.
CVApr 7, 2025
Texture2LoD3: Enabling LoD3 Building Reconstruction With Panoramic ImagesWenzhao Tang, Weihang Li, Xiucheng Liang et al.
Despite recent advancements in surface reconstruction, Level of Detail (LoD) 3 building reconstruction remains an unresolved challenge. The main issue pertains to the object-oriented modelling paradigm, which requires georeferencing, watertight geometry, facade semantics, and low-poly representation -- Contrasting unstructured mesh-oriented models. In Texture2LoD3, we introduce a novel method leveraging the ubiquity of 3D building model priors and panoramic street-level images, enabling the reconstruction of LoD3 building models. We observe that prior low-detail building models can serve as valid planar targets for ortho-rectifying street-level panoramic images. Moreover, deploying segmentation on accurately textured low-level building surfaces supports maintaining essential georeferencing, watertight geometry, and low-poly representation for LoD3 reconstruction. In the absence of LoD3 validation data, we additionally introduce the ReLoD3 dataset, on which we experimentally demonstrate that our method leads to improved facade segmentation accuracy by 11% and can replace costly manual projections. We believe that Texture2LoD3 can scale the adoption of LoD3 models, opening applications in estimating building solar potential or enhancing autonomous driving simulations. The project website, code, and data are available here: https://wenzhaotang.github.io/Texture2LoD3/.
CVMay 3, 2024
HoloGS: Instant Depth-based 3D Gaussian Splatting with Microsoft HoloLens 2Miriam Jäger, Theodor Kapler, Michael Feßenbecker et al.
In the fields of photogrammetry, computer vision and computer graphics, the task of neural 3D scene reconstruction has led to the exploration of various techniques. Among these, 3D Gaussian Splatting stands out for its explicit representation of scenes using 3D Gaussians, making it appealing for tasks like 3D point cloud extraction and surface reconstruction. Motivated by its potential, we address the domain of 3D scene reconstruction, aiming to leverage the capabilities of the Microsoft HoloLens 2 for instant 3D Gaussian Splatting. We present HoloGS, a novel workflow utilizing HoloLens sensor data, which bypasses the need for pre-processing steps like Structure from Motion by instantly accessing the required input data i.e. the images, camera poses and the point cloud from depth sensing. We provide comprehensive investigations, including the training process and the rendering quality, assessed through the Peak Signal-to-Noise Ratio, and the geometric 3D accuracy of the densified point cloud from Gaussian centers, measured by Chamfer Distance. We evaluate our approach on two self-captured scenes: An outdoor scene of a cultural heritage statue and an indoor scene of a fine-structured plant. Our results show that the HoloLens data, including RGB images, corresponding camera poses, and depth sensing based point clouds to initialize the Gaussians, are suitable as input for 3D Gaussian Splatting.
GRFeb 20, 2025
CDGS: Confidence-Aware Depth Regularization for 3D Gaussian SplattingQilin Zhang, Olaf Wysocki, Steffen Urban et al.
3D Gaussian Splatting (3DGS) has shown significant advantages in novel view synthesis (NVS), particularly in achieving high rendering speeds and high-quality results. However, its geometric accuracy in 3D reconstruction remains limited due to the lack of explicit geometric constraints during optimization. This paper introduces CDGS, a confidence-aware depth regularization approach developed to enhance 3DGS. We leverage multi-cue confidence maps of monocular depth estimation and sparse Structure-from-Motion depth to adaptively adjust depth supervision during the optimization process. Our method demonstrates improved geometric detail preservation in early training stages and achieves competitive performance in both NVS quality and geometric accuracy. Experiments on the publicly available Tanks and Temples benchmark dataset show that our method achieves more stable convergence behavior and more accurate geometric reconstruction results, with improvements of up to 2.31 dB in PSNR for NVS and consistently lower geometric errors in M3C2 distance metrics. Notably, our method reaches comparable F-scores to the original 3DGS with only 50% of the training iterations. We expect this work will facilitate the development of efficient and accurate 3D reconstruction systems for real-world applications such as digital twin creation, heritage preservation, or forestry applications.
CVMay 12, 2025
TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark DatasetOlaf Wysocki, Benedikt Schwab, Manoj Kumar Biswanath et al.
Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually limited to one part of the processing chain, hampering comprehensive UDTs validation. To address these challenges, we introduce the first comprehensive multimodal Urban Digital Twin benchmark dataset: TUM2TWIN. This dataset includes georeferenced, semantically aligned 3D models and networks along with various terrestrial, mobile, aerial, and satellite observations boasting 32 data subsets over roughly 100,000 $m^2$ and currently 767 GB of data. By ensuring georeferenced indoor-outdoor acquisition, high accuracy, and multimodal data integration, the benchmark supports robust analysis of sensors and the development of advanced reconstruction methods. Additionally, we explore downstream tasks demonstrating the potential of TUM2TWIN, including novel view synthesis of NeRF and Gaussian Splatting, solar potential analysis, point cloud semantic segmentation, and LoD3 building reconstruction. We are convinced this contribution lays a foundation for overcoming current limitations in UDT creation, fostering new research directions and practical solutions for smarter, data-driven urban environments. The project is available under: https://tum2t.win
CVJan 29, 2025
FeatureGS: Eigenvalue-Feature Optimization in 3D Gaussian Splatting for Geometrically Accurate and Artifact-Reduced ReconstructionMiriam Jäger, Markus Hillemann, Boris Jutzi
3D Gaussian Splatting (3DGS) has emerged as a powerful approach for 3D scene reconstruction using 3D Gaussians. However, neither the centers nor surfaces of the Gaussians are accurately aligned to the object surface, complicating their direct use in point cloud and mesh reconstruction. Additionally, 3DGS typically produces floater artifacts, increasing the number of Gaussians and storage requirements. To address these issues, we present FeatureGS, which incorporates an additional geometric loss term based on an eigenvalue-derived 3D shape feature into the optimization process of 3DGS. The goal is to improve geometric accuracy and enhance properties of planar surfaces with reduced structural entropy in local 3D neighborhoods.We present four alternative formulations for the geometric loss term based on 'planarity' of Gaussians, as well as 'planarity', 'omnivariance', and 'eigenentropy' of Gaussian neighborhoods. We provide quantitative and qualitative evaluations on 15 scenes of the DTU benchmark dataset focusing on following key aspects: Geometric accuracy and artifact-reduction, measured by the Chamfer distance, and memory efficiency, evaluated by the total number of Gaussians. Additionally, rendering quality is monitored by Peak Signal-to-Noise Ratio. FeatureGS achieves a 30 % improvement in geometric accuracy, reduces the number of Gaussians by 90 %, and suppresses floater artifacts, while maintaining comparable photometric rendering quality. The geometric loss with 'planarity' from Gaussians provides the highest geometric accuracy, while 'omnivariance' in Gaussian neighborhoods reduces floater artifacts and number of Gaussians the most. This makes FeatureGS a strong method for geometrically accurate, artifact-reduced and memory-efficient 3D scene reconstruction, enabling the direct use of Gaussian centers for geometric representation.