Matthias B. Hullin

h-index27

11papers

400citations

Novelty47%

AI Score47

Ranked #55,878 of 205,806 authors (top 27%)#20,579 in CV (top 35%)

11 Papers

17.0CVApr 13

3DTV: A Feedforward Interpolation Network for Real-Time View Synthesis

Stefan Schulz, Fernando Edelstein, Hannah Dröge et al.

Real-time free-viewpoint rendering requires balancing multi-camera redundancy with the latency constraints of interactive applications. We address this challenge by combining lightweight geometry with learning and propose 3DTV, a feedforward network for real-time sparse-view interpolation. A Delaunay-based triplet selection ensures angular coverage for each target view. Building on this, we introduce a pose-aware depth module that estimates a coarse-to-fine depth pyramid, enabling efficient feature reprojection and occlusion-aware blending. Unlike methods that require scene-specific optimization, 3DTV runs feedforward without retraining, making it practical for AR/VR, telepresence, and interactive applications. Our experiments on challenging multi-view video datasets demonstrate that 3DTV consistently achieves a strong balance of quality and efficiency, outperforming recent real-time novel-view baselines. Crucially, 3DTV avoids explicit proxies, enabling robust rendering across diverse scenes. This makes it a practical solution for low-latency multi-view streaming and interactive rendering. Project Page: https://stefanmschulz.github.io/3DTV_webpage/

GRJul 10, 2025

Capture Stage Matting: Challenges, Approaches, and Solutions for Offline and Real-Time Processing

Hannah Dröge, Janelle Pfeifer, Saskia Rabich et al.

Capture stages are high-end sources of state-of-the-art recordings for downstream applications in movies, games, and other media. One crucial step in almost all pipelines is matting, i.e., separating captured performances from the background. While common matting algorithms deliver remarkable performance in other applications like teleconferencing and mobile entertainment, we found that they struggle significantly with the peculiarities of capture stage content. The goal of our work is to share insights into those challenges as a curated list of these characteristics along with a constructive discussion for proactive intervention and present a guideline to practitioners for an improved workflow to mitigate unresolved challenges. To this end, we also demonstrate an efficient pipeline to adapt state-of-the-art approaches to such custom setups without the need for extensive annotations, both offline and real-time. For an objective evaluation, we introduce a validation methodology using a state-of-the-art diffusion model to demonstrate the benefits of our approach.

CVOct 27, 2025

Yesnt: Are Diffusion Relighting Models Ready for Capture Stage Compositing? A Hybrid Alternative to Bridge the Gap

Elisabeth Jüttner, Leona Krath, Stefan Korfhage et al.

Volumetric video relighting is essential for bringing captured performances into virtual worlds, but current approaches struggle to deliver temporally stable, production-ready results. Diffusion-based intrinsic decomposition methods show promise for single frames, yet suffer from stochastic noise and instability when extended to sequences, while video diffusion models remain constrained by memory and scale. We propose a hybrid relighting framework that combines diffusion-derived material priors with temporal regularization and physically motivated rendering. Our method aggregates multiple stochastic estimates of per-frame material properties into temporally consistent shading components, using optical-flow-guided regularization. For indirect effects such as shadows and reflections, we extract a mesh proxy from Gaussian Opacity Fields and render it within a standard graphics pipeline. Experiments on real and synthetic captures show that this hybrid strategy achieves substantially more stable relighting across sequences than diffusion-only baselines, while scaling beyond the clip lengths feasible for video diffusion. These results indicate that hybrid approaches, which balance learned priors with physically grounded constraints, are a practical step toward production-ready volumetric video relighting.

CVJun 4, 2024

VHS: High-Resolution Iterative Stereo Matching with Visual Hull Priors

Markus Plack, Hannah Dröge, Leif Van Holland et al.

We present a stereo-matching method for depth estimation from high-resolution images using visual hulls as priors, and a memory-efficient technique for the correlation computation. Our method uses object masks extracted from supplementary views of the scene to guide the disparity estimation, effectively reducing the search space for matches. This approach is specifically tailored to stereo rigs in volumetric capture systems, where an accurate depth plays a key role in the downstream reconstruction task. To enable training and regression at high resolutions targeted by recent systems, our approach extends a sparse correlation computation into a hybrid sparse-dense scheme suitable for application in leading recurrent network architectures. We evaluate the performance-efficiency trade-off of our method compared to state-of-the-art methods, and demonstrate the efficacy of the visual hull guidance. In addition, we propose a training scheme for a further reduction of memory requirements during optimization, facilitating training on high-resolution data.

CVJan 24, 2020

Deep Non-Line-of-Sight Reconstruction

Javier Grau Chopite, Matthias B. Hullin, Michael Wand et al.

The recent years have seen a surge of interest in methods for imaging beyond the direct line of sight. The most prominent techniques rely on time-resolved optical impulse responses, obtained by illuminating a diffuse wall with an ultrashort light pulse and observing multi-bounce indirect reflections with an ultrafast time-resolved imager. Reconstruction of geometry from such data, however, is a complex non-linear inverse problem that comes with substantial computational demands. In this paper, we employ convolutional feed-forward networks for solving the reconstruction problem efficiently while maintaining good reconstruction quality. Specifically, we devise a tailored autoencoder architecture, trained end-to-end, that maps transient images directly to a depth map representation. Training is done using an efficient transient renderer for diffuse three-bounce indirect light transport that enables the quick generation of large amounts of training data for the network. We examine the performance of our method on a variety of synthetic and experimental datasets and its dependency on the choice of training data and augmentation strategies, as well as architectural features. We demonstrate that our feed-forward network, even though it is trained solely on synthetic data, generalizes to measured data from SPAD sensors and is able to obtain results that are competitive with model-based reconstruction methods.

IVDec 20, 2019

A Calibration Scheme for Non-Line-of-Sight Imaging Setups

Jonathan Klein, Martin Laurenzis, Matthias B. Hullin et al.

The recent years have given rise to a large number of techniques for "looking around corners", i.e., for reconstructing occluded objects from time-resolved measurements of indirect light reflections off a wall. While the direct view of cameras is routinely calibrated in computer vision applications, the calibration of non-line-of-sight setups has so far relied on manual measurement of the most important dimensions (device positions, wall position and orientation, etc.). In this paper, we propose a semi-automatic method for calibrating such systems that relies on mirrors as known targets. A roughly determined initialization is refined in order to optimize a spatio-temporal consistency. Our system is general enough to be applicable to a variety of sensing scenarios ranging from single sources/detectors via scanning arrangements to large-scale arrays. It is robust towards bad initialization and the achieved accuracy is proportional to the depth resolution of the camera system. We demonstrate this capability with a real-world setup and despite a large number of dead pixels and very low temporal resolution achieve a result that outperforms a manual calibration.

GRSep 21, 2018

Non-Line-of-Sight Reconstruction using Efficient Transient Rendering

Julian Iseringhausen, Matthias B. Hullin

Being able to see beyond the direct line of sight is an intriguing prospective and could benefit a wide variety of important applications. Recent work has demonstrated that time-resolved measurements of indirect diffuse light contain valuable information for reconstructing shape and reflectance properties of objects located around a corner. In this paper, we introduce a novel reconstruction scheme that, by design, produces solutions that are consistent with state-of-the-art physically-based rendering. Our method combines an efficient forward model (a custom renderer for time-resolved three-bounce indirect light transport) with an optimization framework to reconstruct object geometry in an analysis-by-synthesis sense. We evaluate our algorithm on a variety of synthetic and experimental input data, and show that it gracefully handles uncooperative scenes with high levels of noise or non-diffuse material reflectance.

CVJul 19, 2018

Automated Phenotyping of Epicuticular Waxes of Grapevine Berries Using Light Separation and Convolutional Neural Networks

Pierre Barré, Katja Herzog, Rebecca Höfle et al.

In viticulture the epicuticular wax as the outer layer of the berry skin is known as trait which is correlated to resilience towards Botrytis bunch rot. Traditionally this trait is classified using the OIV descriptor 227 (berry bloom) in a time consuming way resulting in subjective and error-prone phenotypic data. In the present study an objective, fast and sensor-based approach was developed to monitor berry bloom. From the technical point-of-view, it is known that the measurement of different illumination components conveys important information about observed object surfaces. A Mobile Light-Separation-Lab is proposed in order to capture illumination-separated images of grapevine berries for phenotyping the distribution of epicuticular waxes (berry bloom). For image analysis, an efficient convolutional neural network approach is used to derive the uniformity and intactness of waxes on berries. Method validation over six grapevine cultivars shows accuracies up to $97.3$%. In addition, electrical impedance of the cuticle and its epicuticular waxes (described as an indicator for the thickness of berry skin and its permeability) was correlated to the detected proportion of waxes with $r=0.76$. This novel, fast and non-invasive phenotyping approach facilitates enlarged screenings within grapevine breeding material and genetic repositories regarding berry bloom characteristics and its impact on resilience towards Botrytis bunch rot.

HCJul 3, 2018

A Study of Material Sonification in Touchscreen Devices

Rodrigo Martín, Michael Weinmann, Matthias B. Hullin

Even in the digital age, designers largely rely on physical material samples to illustrate their products, as existing visual representations fail to sufficiently reproduce the look and feel of real world materials. Here, we investigate the use of interactive material sonification as an additional sensory modality for communicating well-established material qualities like softness, pleasantness or value. We developed a custom application for touchscreen devices that receives tactile input and translate it into material rubbing sound using granular synthesis. We used this system to perform a psychophysical study, in which the ability of the user to rate subjective material qualities is evaluated, with the actual material samples serving as reference stimulus. Our experimental results indicate that the considered audio cues do not significantly contribute to the perception of material qualities but are able to increase the level of immersion when interacting with digital samples.

HCMay 9, 2018

SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence

Patrick Stotko, Stefan Krumpen, Matthias B. Hullin et al.

Real-time 3D scene reconstruction from RGB-D sensor data, as well as the exploration of such data in VR/AR settings, has seen tremendous progress in recent years. The combination of both these components into telepresence systems, however, comes with significant technical challenges. All approaches proposed so far are extremely demanding on input and output devices, compute resources and transmission bandwidth, and they do not reach the level of immediacy required for applications such as remote collaboration. Here, we introduce what we believe is the first practical client-server system for real-time capture and many-user exploration of static 3D scenes. Our system is based on the observation that interactive frame rates are sufficient for capturing and reconstruction, and real-time performance is only required on the client site to achieve lag-free view updates when rendering the 3D model. Starting from this insight, we extend previous voxel block hashing frameworks by overcoming internal dependencies and introducing, to the best of our knowledge, the first thread-safe GPU hash map data structure that is robust under massively concurrent retrieval, insertion and removal of entries on a thread level. We further propose a novel transmission scheme for volume data that is specifically targeted to Marching Cubes geometry reconstruction and enables a 90% reduction in bandwidth between server and exploration clients. The resulting system poses very moderate requirements on network bandwidth, latency and client-side computation, which enables it to rely entirely on consumer-grade hardware, including mobile devices. We demonstrate that our technique achieves state-of-the-art representation accuracy while providing, for any number of clients, an immersive and fluid lag-free viewing experience even during network outages.

CVJun 3, 2016

Optically lightweight tracking of objects around a corner

Jonathan Klein, Christoph Peters, Jaime Martín et al.

The observation of objects located in inaccessible regions is a recurring challenge in a wide variety of important applications. Recent work has shown that indirect diffuse light reflections can be used to reconstruct objects and two-dimensional (2D) patterns around a corner. However, these prior methods always require some specialized setup involving either ultrafast detectors or narrowband light sources. Here we show that occluded objects can be tracked in real time using a standard 2D camera and a laser pointer. Unlike previous methods based on the backprojection approach, we formulate the problem in an analysis-by-synthesis sense. By repeatedly simulating light transport through the scene, we determine the set of object parameters that most closely fits the measured intensity distribution. We experimentally demonstrate that this approach is capable of following the translation of unknown objects, and translation and orientation of a known object, in real time.