CVDec 9, 2022
Neural Volume Super-ResolutionYuval Bahat, Yuxuan Zhang, Hendrik Sommerhoff et al.
Neural volumetric representations have become a widely adopted model for radiance fields in 3D scenes. These representations are fully implicit or hybrid function approximators of the instantaneous volumetric radiance in a scene, which are typically learned from multi-view captures of the scene. We investigate the new task of neural volume super-resolution - rendering high-resolution views corresponding to a scene captured at low resolution. To this end, we propose a neural super-resolution network that operates directly on the volumetric representation of the scene. This approach allows us to exploit an advantage of operating in the volumetric domain, namely the ability to guarantee consistent super-resolution across different viewing directions. To realize our method, we devise a novel 3D representation that hinges on multiple 2D feature planes. This allows us to super-resolve the 3D scene representation by applying 2D convolutional networks on the 2D feature planes. We validate the proposed method by super-resolving multi-view consistent views on a diverse set of unseen 3D scenes, confirming qualitative and quantitatively favorable quality over existing approaches.
CVApr 28, 2023
Differentiable Sensor Layouts for End-to-End Learning of Task-Specific Camera ParametersHendrik Sommerhoff, Shashank Agnihotri, Mohamed Saleh et al.
The success of deep learning is frequently described as the ability to train all parameters of a network on a specific application in an end-to-end fashion. Yet, several design choices on the camera level, including the pixel layout of the sensor, are considered as pre-defined and fixed, and high resolution, regular pixel layouts are considered to be the most generic ones in computer vision and graphics, treating all regions of an image as equally important. While several works have considered non-uniform, \eg, hexagonal or foveated, pixel layouts in hardware and image processing, the layout has not been integrated into the end-to-end learning paradigm so far. In this work, we present the first truly end-to-end trained imaging pipeline that optimizes the size and distribution of pixels on the imaging sensor jointly with the parameters of a given neural network on a specific task. We derive an analytic, differentiable approach for the sensor layout parameterization that allows for task-specific, local varying pixel resolutions. We present two pixel layout parameterization functions: rectangular and curvilinear grid shapes that retain a regular topology. We provide a drop-in module that approximates sensor simulation given existing high-resolution images to directly connect our method with existing deep learning models. We show that network predictions benefit from learnable pixel layouts for two different downstream tasks, classification and semantic segmentation.
20.1GRMay 15
Smart target point control for Gaussian Splatting methodsPratik Singh Bisht, Andreas Kolb
Standard Gaussian splatting methods rely on heuristic densification and pruning to adaptively allocate primitives during training, and the resulting Gaussian count strongly influences both reconstruction quality and runtime. This makes comparisons across methods fragile: improvements can stem from higher representational capacity rather than algorithmic design. A common and naive workaround for this is hard-stopping or budgeting densification/pruning once a target count is reached, which biases training because different methods hit the cap at different times, yielding non-uniform densify/prune exposure across views and uneven point distributions. We propose a target point control scheme that preserves the standard densification window and cadence, but adjusts only the existing densification and opacity-culling hyper-parameters to track a quadratic target count trajectory. This quota-governor reaches the desired count by 15k iterations without abrupt cutoffs, ensuring that all methods and views receive equal densification and pruning cycles, enabling fairer, capacity-matched evaluation.
GRSep 19, 2025
Neural Atlas Graphs for Dynamic Scene Decomposition and EditingJan Philipp Schneider, Pratik Singh Bisht, Ilya Chugunov et al.
Learning editable high-resolution scene representations for dynamic scenes is an open problem with applications across the domains from autonomous driving to creative editing - the most successful approaches today make a trade-off between editability and supporting scene complexity: neural atlases represent dynamic scenes as two deforming image layers, foreground and background, which are editable in 2D, but break down when multiple objects occlude and interact. In contrast, scene graph models make use of annotated data such as masks and bounding boxes from autonomous-driving datasets to capture complex 3D spatial relationships, but their implicit volumetric node representations are challenging to edit view-consistently. We propose Neural Atlas Graphs (NAGs), a hybrid high-resolution scene representation, where every graph node is a view-dependent neural atlas, facilitating both 2D appearance editing and 3D ordering and positioning of scene elements. Fit at test-time, NAGs achieve state-of-the-art quantitative results on the Waymo Open Dataset - by 5 dB PSNR increase compared to existing methods - and make environmental editing possible in high resolution and visual quality - creating counterfactual driving scenarios with new backgrounds and edited vehicle appearance. We find that the method also generalizes beyond driving scenes and compares favorably - by more than 7 dB in PSNR - to recent matting and video editing baselines on the DAVIS video dataset with a diverse set of human and animal-centric scenes. Project Page: https://princeton-computational-imaging.github.io/nag/
CVJul 15, 2025
Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained PhenotypingYujie Zhang, Sabine Struckmeyer, Andreas Kolb et al.
Observer bias and inconsistencies in traditional plant phenotyping methods limit the accuracy and reproducibility of fine-grained plant analysis. To overcome these challenges, we developed TomatoMAP, a comprehensive dataset for Solanum lycopersicum using an Internet of Things (IoT) based imaging system with standardized data acquisition protocols. Our dataset contains 64,464 RGB images that capture 12 different plant poses from four camera elevation angles. Each image includes manually annotated bounding boxes for seven regions of interest (ROIs), including leaves, panicle, batch of flowers, batch of fruits, axillary shoot, shoot and whole plant area, along with 50 fine-grained growth stage classifications based on the BBCH scale. Additionally, we provide 3,616 high-resolution image subset with pixel-wise semantic and instance segmentation annotations for fine-grained phenotyping. We validated our dataset using a cascading model deep learning framework combining MobileNetv3 for classification, YOLOv11 for object detection, and MaskRCNN for segmentation. Through AI vs. Human analysis involving five domain experts, we demonstrate that the models trained on our dataset achieve accuracy and speed comparable to the experts. Cohen's Kappa and inter-rater agreement heatmap confirm the reliability of automated fine-grained phenotyping using our approach.
IVMay 13, 2020
A Generative Model for Generic Light Field ReconstructionParamanand Chandramouli, Kanchana Vaishnavi Gandikota, Andreas Goerlitz et al.
Recently deep generative models have achieved impressive progress in modeling the distribution of training data. In this work, we present for the first time a generative model for 4D light field patches using variational autoencoders to capture the data distribution of light field patches. We develop a generative model conditioned on the central view of the light field and incorporate this as a prior in an energy minimization framework to address diverse light field reconstruction tasks. While pure learning-based approaches do achieve excellent results on each instance of such a problem, their applicability is limited to the specific observation model they have been trained on. On the contrary, our trained light field generative model can be incorporated as a prior into any model-based optimization approach and therefore extend to diverse reconstruction tasks including light field view synthesis, spatial-angular super resolution and reconstruction from coded projections. Our proposed method demonstrates good reconstruction, with performance approaching end-to-end trained networks, while outperforming traditional model-based approaches on both synthetic and real scenes. Furthermore, we show that our approach enables reliable light field recovery despite distortions in the input.
CVJul 2, 2019
Training Auto-encoder-based Optimizers for Terahertz Image ReconstructionTak Ming Wong, Matthias Kahl, Peter Haring Bolívar et al.
Terahertz (THz) sensing is a promising imaging technology for a wide variety of different applications. Extracting the interpretable and physically meaningful parameters for such applications, however, requires solving an inverse problem in which a model function determined by these parameters needs to be fitted to the measured data. Since the underlying optimization problem is nonconvex and very costly to solve, we propose learning the prediction of suitable parameters from the measured data directly. More precisely, we develop a model-based autoencoder in which the encoder network predicts suitable parameters and the decoder is fixed to a physically meaningful model function, such that we can train the encoding network in an unsupervised way. We illustrate numerically that the resulting network is more than 140 times faster than classical optimization techniques while making predictions with only slightly higher objective values. Using such predictions as starting points of local optimization techniques allows us to converge to better local minima about twice as fast as optimization without the network-based initialization.
HCJun 18, 2019
On-Body Visualization of Patient Data for Cooperative TasksDmitri Presnov, Julia Kurz, Judith Willkomm et al.
Electronic health records (EHR) systematically represent patient data in digital form. However, text and visualization based EHR systems are poorly integrated in the hospital workflow due to their complex and rather non-intuitive access structure. This is especially disadvantageous in clinical cooperative situations that require an efficient, task specific information transfer. In this paper we introduce a novel concept of anatomically integrated in-place visualization designed to engage with cooperative tasks on a neurosurgical ward. Based on the findings of our field studies and the derived design goals, we propose an approach that follows a visual tradition in medicine, which is tightly related with anatomy, by using a virtual patient's body as spatial representation of visually encoded abstract medical data. More specifically, we provide a generic set of formal requirements for these kinds of in-place visualizations, we apply these requirements in order to achieve a specific visualization of neurological symptoms related to the differential diagnosis of spinal disc herniation, and we present a prototypical implementation of the visualization concept on a mobile device. Moreover, we discuss various challenges related to visual encoding and visibility of the body model components. Finally, the prototype is evaluated by 10 neurosurgeons, who assess the validity and the further potential of the proposed approach.
CVDec 10, 2018
Supervised Deep Kriging for Single-Image Super-ResolutionGianni Franchi, Angela Yao, Andreas Kolb
We propose a novel single-image super-resolution approach based on the geostatistical method of kriging. Kriging is a zero-bias minimum-variance estimator that performs spatial interpolation based on a weighted average of known observations. Rather than solving for the kriging weights via the traditional method of inverting covariance matrices, we propose a supervised form in which we learn a deep network to generate said weights. We combine the kriging weight generation and kriging process into a joint network that can be learned end-to-end. Our network achieves competitive super-resolution results as other state-of-the-art methods. In addition, since the super-resolution process follows a known statistical framework, we are able to estimate bias and variance, something which is rarely possible for other deep networks.
CVNov 6, 2018
A Bit Too Much? High Speed Imaging from Sparse Photon CountsParamanand Chandramouli, Samuel Burri, Claudio Bruschini et al.
Recent advances in photographic sensing technologies have made it possible to achieve light detection in terms of a single photon. Photon counting sensors are being increasingly used in many diverse applications. We address the problem of jointly recovering spatial and temporal scene radiance from very few photon counts. Our ConvNet-based scheme effectively combines spatial and temporal information present in measurements to reduce noise. We demonstrate that using our method one can acquire videos at a high frame rate and still achieve good quality signal-to-noise ratio. Experiments show that the proposed scheme performs quite well in different challenging scenarios while the existing approaches are unable to handle them.
CVSep 26, 2018
Residuum-Condition Diagram and Reduction of Over-Complete Endmember-SetsChristoph Schikora, Markus Plack, Andreas Kolb
Extracting reference spectra, or endmembers (EMs) from a given multi- or hyperspectral image, as well as estimating the size of the EM set, plays an important role in multispectral image processing. In this paper, we present condition-residuum-diagrams. By plotting the residuum resulting from the unmixing and reconstruction and the condition number of various EM sets, the resulting diagram provides insight into the behavior of the spectral unmixing under a varying amount of endmembers (EMs). Furthermore, we utilize condition-residuum-diagrams to realize an EM reduction algorithm that starts with an initially extracted, over-complete EM set. An over-complete EM set commonly exhibits a good unmixing result, i.e. a lower reconstruction residuum, but due to its partial redundancy, the unmixing gets numerically unstable, i.e. the unmixed abundances values are less reliable. Our greedy reduction scheme improves the EM set by reducing the condition number, i.e. enhancing the set's stability, while keeping the reconstruction error as low as possible. The resulting set sequence gives hint to the optimal EM set and its size. We demonstrate the benefit of our condition-residuum-diagram and reduction scheme on well-studied datasets with known reference EM set sizes for several well-known EE algorithms.
CVFeb 5, 2016
Preoperative Volume Determination for Pituitary AdenomaDzenan Zukic, Jan Egger, Miriam H. A. Bauer et al.
The most common sellar lesion is the pituitary adenoma, and sellar tumors are approximately 10-15% of all intracranial neoplasms. Manual slice-by-slice segmentation takes quite some time that can be reduced by using the appropriate algorithms. In this contribution, we present a segmentation method for pituitary adenoma. The method is based on an algorithm that we have applied recently to segmenting glioblastoma multiforme. A modification of this scheme is used for adenoma segmentation that is much harder to perform, due to lack of contrast-enhanced boundaries. In our experimental evaluation, neurosurgeons performed manual slice-by-slice segmentation of ten magnetic resonance imaging (MRI) cases. The segmentations were compared to the segmentation results of the proposed method using the Dice Similarity Coefficient (DSC). The average DSC for all datasets was 75.92% +/- 7.24%. A manual segmentation took about four minutes and our algorithm required about one second.
CVMay 20, 2015
Kinect Range Sensing: Structured-Light versus Time-of-Flight KinectHamed Sarbolandi, Damien Lefloch, Andreas Kolb
Recently, the new Kinect One has been issued by Microsoft, providing the next generation of real-time range sensing devices based on the Time-of-Flight (ToF) principle. As the first Kinect version was using a structured light approach, one would expect various differences in the characteristics of the range data delivered by both devices. This paper presents a detailed and in-depth comparison between both devices. In order to conduct the comparison, we propose a framework of seven different experimental setups, which is a generic basis for evaluating range cameras such as Kinect. The experiments have been designed with the goal to capture individual effects of the Kinect devices as isolatedly as possible and in a way, that they can also be adopted, in order to apply them to any other range sensing device. The overall goal of this paper is to provide a solid insight into the pros and cons of either device. Thus, scientists that are interested in using Kinect range sensing cameras in their specific application scenario can directly assess the expected, specific benefits and potential problem of either device.
GRFeb 8, 2013
User Interface for Volume Rendering in Virtual Reality EnvironmentsJonathan Klein, Dennis Reuling, Jan Grimm et al.
Volume Rendering applications require sophisticated user interaction for the definition and refinement of transfer functions. Traditional 2D desktop user interface elements have been developed to solve this task, but such concepts do not map well to the interaction devices available in Virtual Reality environments. In this paper, we propose an intuitive user interface for Volume Rendering specifically designed for Virtual Reality environments. The proposed interface allows transfer function design and refinement based on intuitive two-handed operation of Wand-like controllers. Additional interaction modes such as navigation and clip plane manipulation are supported as well. The system is implemented using the Sony PlayStation Move controller system. This choice is based on controller device capabilities as well as application and environment constraints. Initial results document the potential of our approach.