CVDec 12, 2025
Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent TransmittanceJan U. Müller, Robin Tim Landsgesell, Leif Van Holland et al.
The recent success of 3D Gaussian Splatting (3DGS) has reshaped novel view synthesis by enabling fast optimization and real-time rendering of high-quality radiance fields. However, it relies on simplified, order-dependent alpha blending and coarse approximations of the density integral within the rasterizer, thereby limiting its ability to render complex, overlapping semi-transparent objects. In this paper, we extend rasterization-based rendering of 3D Gaussian representations with a novel method for high-fidelity transmittance computation, entirely avoiding the need for ray tracing or per-pixel sample sorting. Building on prior work in moment-based order-independent transparency, our key idea is to characterize the density distribution along each camera ray with a compact and continuous representation based on statistical moments. To this end, we analytically derive and compute a set of per-pixel moments from all contributing 3D Gaussians. From these moments, a continuous transmittance function is reconstructed for each ray, which is then independently sampled within each Gaussian. As a result, our method bridges the gap between rasterization and physical accuracy by modeling light attenuation in complex translucent media, significantly improving overall reconstruction and rendering quality.
CVMay 2, 2022
Incomplete Gamma Kernels: Generalizing Locally Optimal Projection OperatorsPatrick Stotko, Michael Weinmann, Reinhard Klein
We present incomplete gamma kernels, a generalization of Locally Optimal Projection (LOP) operators. In particular, we reveal the relation of the classical localized $ L_1 $ estimator, used in the LOP operator for point cloud denoising, to the common Mean Shift framework via a novel kernel. Furthermore, we generalize this result to a whole family of kernels that are built upon the incomplete gamma function and each represents a localized $ L_p $ estimator. By deriving various properties of the kernel family concerning distributional, Mean Shift induced, and other aspects such as strict positive definiteness, we obtain a deeper understanding of the operator's projection behavior. From these theoretical insights, we illustrate several applications ranging from an improved Weighted LOP (WLOP) density weighting scheme and a more accurate Continuous LOP (CLOP) kernel approximation to the definition of a novel set of robust loss functions. These incomplete gamma losses include the Gaussian and LOP loss as special cases and can be applied to various tasks including normal filtering. Furthermore, we show that the novel kernels can be included as priors into neural networks. We demonstrate the effects of each application in a range of quantitative and qualitative experiments that highlight the benefits induced by our modifications.
CVNov 25, 2022
Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale EnvironmentsLeif Van Holland, Patrick Stotko, Stefan Krumpen et al.
Despite the impressive progress of telepresence systems for room-scale scenes with static and dynamic scene entities, expanding their capabilities to scenarios with larger dynamic environments beyond a fixed size of a few square-meters remains challenging. In this paper, we aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic scene entities at practical bandwidth requirements only based on light-weight scene capture with a single moving consumer-grade RGB-D camera. To this end, we present a system which is built upon a novel hybrid volumetric scene representation in terms of the combination of a voxel-based scene representation for the static contents, that not only stores the reconstructed surface geometry but also contains information about the object semantics as well as their accumulated dynamic movement over time, and a point-cloud-based representation for dynamic scene parts, where the respective separation from static parts is achieved based on semantic and instance information extracted for the input frames. With an independent yet simultaneous streaming of both static and dynamic content, where we seamlessly integrate potentially moving but currently static scene entities in the static model until they are becoming dynamic again, as well as the fusion of static and dynamic data at the remote client, our system is able to achieve VR-based live-telepresence at close to real-time rates. Our evaluation demonstrates the potential of our novel approach in terms of visual quality, performance, and ablation studies regarding involved design choices.
CVOct 31, 2023
FPO++: Efficient Encoding and Rendering of Dynamic Neural Radiance Fields by Analyzing and Enhancing Fourier PlenOctreesSaskia Rabich, Patrick Stotko, Reinhard Klein
Fourier PlenOctrees have shown to be an efficient representation for real-time rendering of dynamic Neural Radiance Fields (NeRF). Despite its many advantages, this method suffers from artifacts introduced by the involved compression when combining it with recent state-of-the-art techniques for training the static per-frame NeRF models. In this paper, we perform an in-depth analysis of these artifacts and leverage the resulting insights to propose an improved representation. In particular, we present a novel density encoding that adapts the Fourier-based compression to the characteristics of the transfer function used by the underlying volume rendering procedure and leads to a substantial reduction of artifacts in the dynamic model. Furthermore, we show an augmentation of the training data that relaxes the periodicity assumption of the compression. We demonstrate the effectiveness of our enhanced Fourier PlenOctrees in the scope of quantitative and qualitative evaluations on synthetic and real-world scenes.
CVOct 16, 2023
TraM-NeRF: Tracing Mirror and Near-Perfect Specular Reflections through Neural Radiance FieldsLeif Van Holland, Ruben Bliersbach, Jan U. Müller et al.
Implicit representations like Neural Radiance Fields (NeRF) showed impressive results for photorealistic rendering of complex scenes with fine details. However, ideal or near-perfectly specular reflecting objects such as mirrors, which are often encountered in various indoor scenes, impose ambiguities and inconsistencies in the representation of the reconstructed scene leading to severe artifacts in the synthesized renderings. In this paper, we present a novel reflection tracing method tailored for the involved volume rendering within NeRF that takes these mirror-like objects into account while avoiding the cost of straightforward but expensive extensions through standard path tracing. By explicitly modeling the reflection behavior using physically plausible materials and estimating the reflected radiance with Monte-Carlo methods within the volume rendering formulation, we derive efficient strategies for importance sampling and the transmittance computation along rays from only few samples. We show that our novel method enables the training of consistent representations of such challenging scenes and achieves superior results in comparison to previous state-of-the-art approaches.
GRMay 19
PiG-Avatar: Hierarchical Neural-Field-Guided Gaussian AvatarsJulian Kaltheuner, Jan Spindler, Sina Kitz et al.
Existing Gaussian avatar methods typically parameterize geometry on a body-template surface, which entangles the avatar's representation space with the template's deformation space and limits the capture of layered, off-body, and non-rigid clothing geometry. We present PiG-Avatar, which addresses this limitation by using the parametric body model solely for kinematic transport, while representing the avatar as Gaussians anchored in a volumetric canonical space governed by a continuous neural field. This decouples representation from template topology, avoiding the geometric constraints of surface-based parameterizations. Kinematic coherence is maintained through 3D barycentric anchor transport, which guides motion without constraining geometry and allows anchors to deviate freely from the template surface, yielding dense, stable temporal surface correspondences by construction. To make this unconstrained formulation tractable, we introduce dual-level spatially coherent optimization, combining Sobolev-preconditioned neural-field updates with a novel KNN-based preconditioning of canonical anchor geometry. Together, these mechanisms induce an emergent self-organization of anchor density: anchors migrate toward regions of high curvature, appearance variation, and non-coherent motion without explicit heuristics. As a result, complex clothing geometry and layered surfaces emerge as natural, high-fidelity outputs. This single representation further supports hierarchical reconstruction across multiple levels of detail, with coarse-level supervision propagating to finer levels through the shared field and coupled anchor graph. On established benchmarks featuring subjects with complex clothing and challenging non-rigid motion, PiG-Avatar achieves state-of-the-art rendering quality, generalizes robustly to imperfect body model initialization, and renders in real time across all detail levels.
ROMar 26
RHINO-AR: An Augmented Reality Exhibit for Teaching Mobile Robotics Concepts in MuseumsNils Dengler, Tim Graf, Leif Van Holland et al.
We present RHINO-AR, an interactive Augmented Reality (AR) museum exhibit that reintroduces the historical mobile robot RHINO into its original exhibition environment at the Deutsches Museum Bonn. The system builds on our previous work RHINO-VR, which reconstructed the robot and the environment in virtual reality. Although this created an engaging experience, it also revealed an important limitation, because visitors were separated from the real exhibition space and from the physical robot on display. RHINO-AR addresses this reality gap by placing a virtual reconstruction of the robot directly into the real museum space. Implemented on a Magic Leap~2 headset using Unity, our system combines real-time environment meshing with interactive visualizations of LiDAR sensing, traversability, and path planning to make otherwise invisible robotics processes understandable to non-expert visitors. We evaluated RHINO-AR in a two-day museum study with 22 participants, assessing usability, technical performance, satisfaction, conceptual understanding, and preference comparison to RHINO-VR. The results show that RHINO-AR was well received, effectively conveyed key navigation concepts, and generally preferred over the VR exhibit due to its stronger physical grounding and increased realism.
CVFeb 25
Neu-PiG: Neural Preconditioned Grids for Fast Dynamic Surface Reconstruction on Long SequencesJulian Kaltheuner, Hannah Dröge, Markus Plack et al.
Temporally consistent surface reconstruction of dynamic 3D objects from unstructured point cloud data remains challenging, especially for very long sequences. Existing methods either optimize deformations incrementally, risking drift and requiring long runtimes, or rely on complex learned models that demand category-specific training. We present Neu-PiG, a fast deformation optimization method based on a novel preconditioned latent-grid encoding that distributes spatial features parameterized on the position and normal direction of a keyframe surface. Our method encodes entire deformations across all time steps at various spatial scales into a multi-resolution latent grid, parameterized by the position and normal direction of a reference surface from a single keyframe. This latent representation is then augmented for time modulation and decoded into per-frame 6-DoF deformations via a lightweight multilayer perceptron (MLP). To achieve high-fidelity, drift-free surface reconstructions in seconds, we employ Sobolev preconditioning during gradient-based training of the latent space, completely avoiding the need for any explicit correspondences or further priors. Experiments across diverse human and animal datasets demonstrate that Neu-PiG outperforms state-the-art approaches, offering both superior accuracy and scalability to long sequences while running at least 60x faster than existing training-free methods and achieving inference speeds on the same order as heavy pretrained models.
CVJan 30, 2025
ROSA: Reconstructing Object Shape and Appearance Textures by Adaptive Detail TransferJulian Kaltheuner, Patrick Stotko, Reinhard Klein
Reconstructing an object's shape and appearance in terms of a mesh textured by a spatially-varying bidirectional reflectance distribution function (SVBRDF) from a limited set of images captured under collocated light is an ill-posed problem. Previous state-of-the-art approaches either aim to reconstruct the appearance directly on the geometry or additionally use texture normals as part of the appearance features. However, this requires detailed but inefficiently large meshes, that would have to be simplified in a post-processing step, or suffers from well-known limitations of normal maps such as missing shadows or incorrect silhouettes. Another limiting factor is the fixed and typically low resolution of the texture estimation resulting in loss of important surface details. To overcome these problems, we present ROSA, an inverse rendering method that directly optimizes mesh geometry with spatially adaptive mesh resolution solely based on the image data. In particular, we refine the mesh and locally condition the surface smoothness based on the estimated normal texture and mesh curvature. In addition, we enable the reconstruction of fine appearance details in high-resolution textures through a pioneering tile-based method that operates on a single pre-trained decoder network but is not limited by the network output resolution.
CVJan 7, 2025
NeRFs are Mirror Detectors: Using Structural Similarity for Multi-View Mirror Scene Reconstruction with 3D Surface PrimitivesLeif Van Holland, Michael Weinmann, Jan U. Müller et al.
While neural radiance fields (NeRF) led to a breakthrough in photorealistic novel view synthesis, handling mirroring surfaces still denotes a particular challenge as they introduce severe inconsistencies in the scene representation. Previous attempts either focus on reconstructing single reflective objects or rely on strong supervision guidance in terms of additional user-provided annotations of visible image regions of the mirrors, thereby limiting the practical usability. In contrast, in this paper, we present NeRF-MD, a method which shows that NeRFs can be considered as mirror detectors and which is capable of reconstructing neural radiance fields of scenes containing mirroring surfaces without the need for prior annotations. To this end, we first compute an initial estimate of the scene geometry by training a standard NeRF using a depth reprojection loss. Our key insight lies in the fact that parts of the scene corresponding to a mirroring surface will still exhibit a significant photometric inconsistency, whereas the remaining parts are already reconstructed in a plausible manner. This allows us to detect mirror surfaces by fitting geometric primitives to such inconsistent regions in this initial stage of the training. Using this information, we then jointly optimize the radiance field and mirror geometry in a second training stage to refine their quality. We demonstrate the capability of our method to allow the faithful detection of mirrors in the scene as well as the reconstruction of a single consistent scene representation, and demonstrate its potential in comparison to baseline and mirror-aware approaches.
CVMar 5
Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera SetupsLeif Van Holland, Domenic Zingsheim, Mana Takhsha et al.
High-quality 3D streaming from multiple cameras is crucial for immersive experiences in many AR/VR applications. The limited number of views - often due to real-time constraints - leads to missing information and incomplete surfaces in the rendered images. Existing approaches typically rely on simple heuristics for the hole filling, which can result in inconsistencies or visual artifacts. We propose to complete the missing textures using a novel, application-targeted inpainting method independent of the underlying representation as an image-based post-processing step after the novel view rendering. The method is designed as a standalone module compatible with any calibrated multi-camera system. For this we introduce a multi-view aware, transformer-based network architecture using spatio-temporal embeddings to ensure consistency across frames while preserving fine details. Additionally, our resolution-independent design allows adaptation to different camera setups, while an adaptive patch selection strategy balances inference speed and quality, allowing real-time performance. We evaluate our approach against state-of-the-art inpainting techniques under the same real-time constraints and demonstrate that our model achieves the best trade-off between quality and speed, outperforming competitors in both image and video-based metrics.
CVSep 22, 2025
Preconditioned Deformation GridsJulian Kaltheuner, Alexander Oebel, Hannah Droege et al.
Dynamic surface reconstruction of objects from point cloud sequences is a challenging field in computer graphics. Existing approaches either require multiple regularization terms or extensive training data which, however, lead to compromises in reconstruction accuracy as well as over-smoothing or poor generalization to unseen objects and motions. To address these lim- itations, we introduce Preconditioned Deformation Grids, a novel technique for estimating coherent deformation fields directly from unstructured point cloud sequences without requiring or forming explicit correspondences. Key to our approach is the use of multi-resolution voxel grids that capture the overall motion at varying spatial scales, enabling a more flexible deformation representation. In conjunction with incorporating grid-based Sobolev preconditioning into gradient-based optimization, we show that applying a Chamfer loss between the input point clouds as well as to an evolving template mesh is sufficient to obtain accurate deformations. To ensure temporal consistency along the object surface, we include a weak isometry loss on mesh edges which complements the main objective without constraining deformation fidelity. Extensive evaluations demonstrate that our method achieves superior results, particularly for long sequences, compared to state-of-the-art techniques.
CVDec 13, 2019
Bonn Activity Maps: Dataset DescriptionJulian Tanke, Oh-Hun Kwon, Patrick Stotko et al.
The key prerequisite for accessing the huge potential of current machine learning techniques is the availability of large databases that capture the complex relations of interest. Previous datasets are focused on either 3D scene representations with semantic information, tracking of multiple persons and recognition of their actions, or activity recognition of a single person in captured 3D environments. We present Bonn Activity Maps, a large-scale dataset for human tracking, activity recognition and anticipation of multiple persons. Our dataset comprises four different scenes that have been recorded by time-synchronized cameras each only capturing the scene partially, the reconstructed 3D models with semantic annotations, motion trajectories for individual people including 3D human poses as well as human activity annotations. We utilize the annotations to generate activity likelihoods on the 3D models called activity maps.
HCAug 8, 2019
Efficient 3D Reconstruction and Streaming for Group-Scale Multi-Client Live TelepresencePatrick Stotko, Stefan Krumpen, Michael Weinmann et al.
Sharing live telepresence experiences for teleconferencing or remote collaboration receives increasing interest with the recent progress in capturing and AR/VR technology. Whereas impressive telepresence systems have been proposed on top of on-the-fly scene capture, data transmission and visualization, these systems are restricted to the immersion of single or up to a low number of users into the respective scenarios. In this paper, we direct our attention on immersing significantly larger groups of people into live-captured scenes as required in education, entertainment or collaboration scenarios. For this purpose, rather than abandoning previous approaches, we present a range of optimizations of the involved reconstruction and streaming components that allow the immersion of a group of more than 24 users within the same scene - which is about a factor of 6 higher than in previous work - without introducing further latency or changing the involved consumer hardware setup. We demonstrate that our optimized system is capable of generating high-quality scene reconstructions as well as providing an immersive viewing experience to a large group of people within these live-captured scenes.
HCAug 8, 2019
A VR System for Immersive Teleoperation and Live Exploration with a Mobile RobotPatrick Stotko, Stefan Krumpen, Max Schwarz et al.
Applications like disaster management and industrial inspection often require experts to enter contaminated places. To circumvent the need for physical presence, it is desirable to generate a fully immersive individual live teleoperation experience. However, standard video-based approaches suffer from a limited degree of immersion and situation awareness due to the restriction to the camera view, which impacts the navigation. In this paper, we present a novel VR-based practical system for immersive robot teleoperation and scene exploration. While being operated through the scene, a robot captures RGB-D data that is streamed to a SLAM-based live multi-client telepresence system. Here, a global 3D model of the already captured scene parts is reconstructed and streamed to the individual remote user clients where the rendering for e.g. head-mounted display devices (HMDs) is performed. We introduce a novel lightweight robot client component which transmits robot-specific data and enables a quick integration into existing robotic systems. This way, in contrast to first-person exploration systems, the operators can explore and navigate in the remote site completely independent of the current position and view of the capturing robot, complementing traditional input devices for teleoperation. We provide a proof-of-concept implementation and demonstrate the capabilities as well as the performance of our system regarding interactive object measurements and bandwidth-efficient data streaming and visualization. Furthermore, we show its benefits over purely video-based teleoperation in a user study revealing a higher degree of situation awareness and a more precise navigation in challenging environments.
HCMay 9, 2018
SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live TelepresencePatrick Stotko, Stefan Krumpen, Matthias B. Hullin et al.
Real-time 3D scene reconstruction from RGB-D sensor data, as well as the exploration of such data in VR/AR settings, has seen tremendous progress in recent years. The combination of both these components into telepresence systems, however, comes with significant technical challenges. All approaches proposed so far are extremely demanding on input and output devices, compute resources and transmission bandwidth, and they do not reach the level of immediacy required for applications such as remote collaboration. Here, we introduce what we believe is the first practical client-server system for real-time capture and many-user exploration of static 3D scenes. Our system is based on the observation that interactive frame rates are sufficient for capturing and reconstruction, and real-time performance is only required on the client site to achieve lag-free view updates when rendering the 3D model. Starting from this insight, we extend previous voxel block hashing frameworks by overcoming internal dependencies and introducing, to the best of our knowledge, the first thread-safe GPU hash map data structure that is robust under massively concurrent retrieval, insertion and removal of entries on a thread level. We further propose a novel transmission scheme for volume data that is specifically targeted to Marching Cubes geometry reconstruction and enables a 90% reduction in bandwidth between server and exploration clients. The resulting system poses very moderate requirements on network bandwidth, latency and client-side computation, which enables it to rely entirely on consumer-grade hardware, including mobile devices. We demonstrate that our technique achieves state-of-the-art representation accuracy while providing, for any number of clients, an immersive and fluid lag-free viewing experience even during network outages.