Thomas Köhler

CV
h-index5
15papers
120citations
Novelty52%
AI Score43

15 Papers

CVAug 24, 2023
LORD: Leveraging Open-Set Recognition with Unknown Data

Tobias Koch, Christian Riess, Thomas Köhler

Handling entirely unknown data is a challenge for any deployed classifier. Classification models are typically trained on a static pre-defined dataset and are kept in the dark for the open unassigned feature space. As a result, they struggle to deal with out-of-distribution data during inference. Addressing this task on the class-level is termed open-set recognition (OSR). However, most OSR methods are inherently limited, as they train closed-set classifiers and only adapt the downstream predictions to OSR. This work presents LORD, a framework to Leverage Open-set Recognition by exploiting unknown Data. LORD explicitly models open space during classifier training and provides a systematic evaluation for such approaches. We identify three model-agnostic training strategies that exploit background data and applied them to well-established classifiers. Due to LORD's extensive evaluation protocol, we consistently demonstrate improved recognition of unknown data. The benchmarks facilitate in-depth analysis across various requirement levels. To mitigate dependency on extensive and costly background datasets, we explore mixup as an off-the-shelf data generation technique. Our experiments highlight mixup's effectiveness as a substitute for background datasets. Lightweight constraints on mixup synthesis further improve OSR performance.

LGMay 30, 2022
Exploring the Open World Using Incremental Extreme Value Machines

Tobias Koch, Felix Liebezeit, Christian Riess et al.

Dynamic environments require adaptive applications. One particular machine learning problem in dynamic environments is open world recognition. It characterizes a continuously changing domain where only some classes are seen in one batch of the training data and such batches can only be learned incrementally. Open world recognition is a demanding task that is, to the best of our knowledge, addressed by only a few methods. This work introduces a modification of the widely known Extreme Value Machine (EVM) to enable open world recognition. Our proposed method extends the EVM with a partial model fitting function by neglecting unaffected space during an update. This reduces the training time by a factor of 28. In addition, we provide a modified model reduction using weighted maximum K-set cover to strictly bound the model complexity and reduce the computational effort by a factor of 3.5 from 2.1 s to 0.6 s. In our experiments, we rigorously evaluate openness with two novel evaluation protocols. The proposed method achieves superior accuracy of about 12 % and computational efficiency in the tasks of image classification and face recognition.

GRFeb 3
Pi-GS: Sparse-View Gaussian Splatting with Dense π^3 Initialization

Manuel Hofer, Markus Steinberger, Thomas Köhler

Novel view synthesis has evolved rapidly, advancing from Neural Radiance Fields to 3D Gaussian Splatting (3DGS), which offers real-time rendering and rapid training without compromising visual fidelity. However, 3DGS relies heavily on accurate camera poses and high-quality point cloud initialization, which are difficult to obtain in sparse-view scenarios. While traditional Structure from Motion (SfM) pipelines often fail in these settings, existing learning-based point estimation alternatives typically require reliable reference views and remain sensitive to pose or depth errors. In this work, we propose a robust method utilizing π^3, a reference-free point cloud estimation network. We integrate dense initialization from π^3 with a regularization scheme designed to mitigate geometric inaccuracies. Specifically, we employ uncertainty-guided depth supervision, normal consistency loss, and depth warping. Experimental results demonstrate that our approach achieves state-of-the-art performance on the Tanks and Temples, LLFF, DTU, and MipNeRF360 datasets.

GRJul 1, 2025
A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory

Felix Windisch, Thomas Köhler, Lukas Radl et al.

Gaussian Splatting has emerged as a high-performance technique for novel view synthesis, enabling real-time rendering and high-quality reconstruction of small scenes. However, scaling to larger environments has so far relied on partitioning the scene into chunks -- a strategy that introduces artifacts at chunk boundaries, complicates training across varying scales, and is poorly suited to unstructured scenarios such as city-scale flyovers combined with street-level views. Moreover, rendering remains fundamentally limited by GPU memory, as all visible chunks must reside in VRAM simultaneously. We introduce A LoD of Gaussians, a framework for training and rendering ultra-large-scale Gaussian scenes on a single consumer-grade GPU -- without partitioning. Our method stores the full scene out-of-core (e.g., in CPU memory) and trains a Level-of-Detail (LoD) representation directly, dynamically streaming only the relevant Gaussians. A hybrid data structure combining Gaussian hierarchies with Sequential Point Trees enables efficient, view-dependent LoD selection, while a lightweight caching and view scheduling system exploits temporal coherence to support real-time streaming and rendering. Together, these innovations enable seamless multi-scale reconstruction and interactive visualization of complex scenes -- from broad aerial views to fine-grained ground-level details.

GRApr 17, 2025
AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering

Michael Steiner, Thomas Köhler, Lukas Radl et al.

Although 3D Gaussian Splatting (3DGS) has revolutionized 3D reconstruction, it still faces challenges such as aliasing, projection artifacts, and view inconsistencies, primarily due to the simplification of treating splats as 2D entities. We argue that incorporating full 3D evaluation of Gaussians throughout the 3DGS pipeline can effectively address these issues while preserving rasterization efficiency. Specifically, we introduce an adaptive 3D smoothing filter to mitigate aliasing and present a stable view-space bounding method that eliminates popping artifacts when Gaussians extend beyond the view frustum. Furthermore, we promote tile-based culling to 3D with screen-space planes, accelerating rendering and reducing sorting costs for hierarchical rasterization. Our method achieves state-of-the-art quality on in-distribution evaluation sets and significantly outperforms other approaches for out-of-distribution views. Our qualitative evaluations further demonstrate the effective removal of aliasing, distortions, and popping artifacts, ensuring real-time, artifact-free rendering.

CVNov 10, 2020
Joint Super-Resolution and Rectification for Solar Cell Inspection

Mathis Hoffmann, Thomas Köhler, Bernd Doll et al.

Visual inspection of solar modules is an important monitoring facility in photovoltaic power plants. Since a single measurement of fast CMOS sensors is limited in spatial resolution and often not sufficient to reliably detect small defects, we apply multi-frame super-resolution (MFSR) to a sequence of low resolution measurements. In addition, the rectification and removal of lens distortion simplifies subsequent analysis. Therefore, we propose to fuse this pre-processing with standard MFSR algorithms. This is advantageous, because we omit a separate processing step, the motion estimation becomes more stable and the spacing of high-resolution (HR) pixels on the rectified module image becomes uniform w. r. t. the module plane, regardless of perspective distortion. We present a comprehensive user study showing that MFSR is beneficial for defect recognition by human experts and that the proposed method performs better than the state of the art. Furthermore, we apply automated crack segmentation and show that the proposed method performs 3x better than bicubic upsampling and 2x better than the state of the art for automated inspection.

IVNov 12, 2019
Merging-ISP: Multi-Exposure High Dynamic Range Image Signal Processing

Prashant Chaudhari, Franziska Schirrmacher, Andreas Maier et al.

High dynamic range (HDR) imaging combines multiple images with different exposure times into a single high-quality image. The image signal processing pipeline (ISP) is a core component in digital cameras to perform these operations. It includes demosaicing of raw color filter array (CFA) data at different exposure times, alignment of the exposures, conversion to HDR domain, and exposure merging into an HDR image. Traditionally, such pipelines cascade algorithms that address these individual subtasks. However, cascaded designs suffer from error propagation, since simply combining multiple steps is not necessarily optimal for the entire imaging task. This paper proposes a multi-exposure HDR image signal processing pipeline (Merging-ISP) to jointly solve all these subtasks. Our pipeline is modeled by a deep neural network architecture. As such, it is end-to-end trainable, circumvents the use of hand-crafted and potentially complex algorithms, and mitigates error propagation. Merging-ISP enables direct reconstructions of HDR images of dynamic scenes from multiple raw CFA images with different exposures. We compare Merging-ISP against several state-of-the-art cascaded pipelines. The proposed method provides HDR reconstructions of high perceptual quality and it quantitatively outperforms competing ISPs by more than 1 dB in terms of PSNR.

CVDec 21, 2018
Multi-Frame Super-Resolution Reconstruction with Applications to Medical Imaging

Thomas Köhler

The optical resolution of a digital camera is one of its most crucial parameters with broad relevance for consumer electronics, surveillance systems, remote sensing, or medical imaging. However, resolution is physically limited by the optics and sensor characteristics. In addition, practical and economic reasons often stipulate the use of out-dated or low-cost hardware. Super-resolution is a class of retrospective techniques that aims at high-resolution imagery by means of software. Multi-frame algorithms approach this task by fusing multiple low-resolution frames to reconstruct high-resolution images. This work covers novel super-resolution methods along with new applications in medical imaging.

CVSep 17, 2018
Toward Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real Data

Thomas Köhler, Michel Bätz, Farzad Naderi et al.

Capturing ground truth data to benchmark super-resolution (SR) is challenging. Therefore, current quantitative studies are mainly evaluated on simulated data artificially sampled from ground truth images. We argue that such evaluations overestimate the actual performance of SR methods compared to their behavior on real images. Toward bridging this simulated-to-real gap, we introduce the Super-Resolution Erlangen (SupER) database, the first comprehensive laboratory SR database of all-real acquisitions with pixel-wise ground truth. It consists of more than 80k images of 14 scenes combining different facets: CMOS sensor noise, real sampling at four resolution levels, nine scene motion types, two photometric conditions, and lossy video coding at five levels. As such, the database exceeds existing benchmarks by an order of magnitude in quality and quantity. This paper also benchmarks 19 popular single-image and multi-frame algorithms on our data. The benchmark comprises a quantitative study by exploiting ground truth data and qualitative evaluations in a large-scale observer study. We also rigorously investigate agreements between both evaluations from a statistical perspective. One interesting result is that top-performing methods on simulated data may be surpassed by others on real data. Our insights can spur further algorithm development, and the publicy available dataset can foster future evaluations.

CVFeb 15, 2018
Learning from a Handful Volumes: MRI Resolution Enhancement with Volumetric Super-Resolution Forests

Aline Sindel, Katharina Breininger, Johannes Käßer et al.

Magnetic resonance imaging (MRI) enables 3-D imaging of anatomical structures. However, the acquisition of MR volumes with high spatial resolution leads to long scan times. To this end, we propose volumetric super-resolution forests (VSRF) to enhance MRI resolution retrospectively. Our method learns a locally linear mapping between low-resolution and high-resolution volumetric image patches by employing random forest regression. We customize features suitable for volumetric MRI to train the random forest and propose a median tree ensemble for robust regression. VSRF outperforms state-of-the-art example-based super-resolution in term of image quality and efficiency for model training and inference in different MRI datasets. It is also superior to unsupervised methods with just a handful or even a single volume to assemble training data.

CVFeb 12, 2018
Temporal and volumetric denoising via quantile sparse image prior

Franziska Schirrmacher, Thomas Köhler, Tobias Lindenberger et al.

This paper introduces an universal and structure-preserving regularization term, called quantile sparse image (QuaSI) prior. The prior is suitable for denoising images from various medical imaging modalities. We demonstrate its effectiveness on volumetric optical coherence tomography (OCT) and computed tomography (CT) data, which show different noise and image characteristics. OCT offers high-resolution scans of the human retina but is inherently impaired by speckle noise. CT on the other hand has a lower resolution and shows high-frequency noise. For the purpose of denoising, we propose a variational framework based on the QuaSI prior and a Huber data fidelity model that can handle 3-D and 3-D+t data. Efficient optimization is facilitated through the use of an alternating direction method of multipliers (ADMM) scheme and the linearization of the quantile filter. Experiments on multiple datasets emphasize the excellent performance of the proposed method.

CVSep 8, 2017
Benchmarking Super-Resolution Algorithms on Real Data

Thomas Köhler, Michel Bätz, Farzad Naderi et al.

Over the past decades, various super-resolution (SR) techniques have been developed to enhance the spatial resolution of digital images. Despite the great number of methodical contributions, there is still a lack of comparative validations of SR under practical conditions, as capturing real ground truth data is a challenging task. Therefore, current studies are either evaluated 1) on simulated data or 2) on real data without a pixel-wise ground truth. To facilitate comprehensive studies, this paper introduces the publicly available Super-Resolution Erlangen (SupER) database that includes real low-resolution images along with high-resolution ground truth data. Our database comprises image sequences with more than 20k images captured from 14 scenes under various types of motions and photometric conditions. The datasets cover four spatial resolution levels using camera hardware binning. With this database, we benchmark 15 single-image and multi-frame SR algorithms. Our experiments quantitatively analyze SR accuracy and robustness under realistic conditions including independent object and camera motion or photometric variations.

CVMar 8, 2017
QuaSI: Quantile Sparse Image Prior for Spatio-Temporal Denoising of Retinal OCT Data

Franziska Schirrmacher, Thomas Köhler, Lennart Husvogt et al.

Optical coherence tomography (OCT) enables high-resolution and non-invasive 3D imaging of the human retina but is inherently impaired by speckle noise. This paper introduces a spatio-temporal denoising algorithm for OCT data on a B-scan level using a novel quantile sparse image (QuaSI) prior. To remove speckle noise while preserving image structures of diagnostic relevance, we implement our QuaSI prior via median filter regularization coupled with a Huber data fidelity model in a variational approach. For efficient energy minimization, we develop an alternating direction method of multipliers (ADMM) scheme using a linearization of median filtering. Our spatio-temporal method can handle both, denoising of single B-scans and temporally consecutive B-scans, to gain volumetric OCT data with enhanced signal-to-noise ratio. Our algorithm based on 4 B-scans only achieved comparable performance to averaging 13 B-scans and outperformed other current denoising methods.

CVSep 6, 2016
Confidence-aware Levenberg-Marquardt optimization for joint motion estimation and super-resolution

Cosmin Bercea, Andreas Maier, Thomas Köhler

Motion estimation across low-resolution frames and the reconstruction of high-resolution images are two coupled subproblems of multi-frame super-resolution. This paper introduces a new joint optimization approach for motion estimation and image reconstruction to address this interdependence. Our method is formulated via non-linear least squares optimization and combines two principles of robust super-resolution. First, to enhance the robustness of the joint estimation, we propose a confidence-aware energy minimization framework augmented with sparse regularization. Second, we develop a tailor-made Levenberg-Marquardt iteration scheme to jointly estimate motion parameters and the high-resolution image along with the corresponding model confidence parameters. Our experiments on simulated and real images confirm that the proposed approach outperforms decoupled motion estimation and image reconstruction as well as related state-of-the-art joint estimation algorithms.

CVFeb 10, 2016
Super-Resolved Retinal Image Mosaicing

Thomas Köhler, Axel Heinrich, Andreas Maier et al.

The acquisition of high-resolution retinal fundus images with a large field of view (FOV) is challenging due to technological, physiological and economic reasons. This paper proposes a fully automatic framework to reconstruct retinal images of high spatial resolution and increased FOV from multiple low-resolution images captured with non-mydriatic, mobile and video-capable but low-cost cameras. Within the scope of one examination, we scan different regions on the retina by exploiting eye motion conducted by a patient guidance. Appropriate views for our mosaicing method are selected based on optic disk tracking to trace eye movements. For each view, one super-resolved image is reconstructed by fusion of multiple video frames. Finally, all super-resolved views are registered to a common reference using a novel polynomial registration scheme and combined by means of image mosaicing. We evaluated our framework for a mobile and low-cost video fundus camera. In our experiments, we reconstructed retinal images of up to 30° FOV from 10 complementary views of 15° FOV. An evaluation of the mosaics by human experts as well as a quantitative comparison to conventional color fundus images encourage the clinical usability of our framework.