CVAug 31, 2023Code
BTSeg: Barlow Twins Regularization for Domain Adaptation in Semantic SegmentationJohannes Künzel, Anna Hilsmann, Peter Eisert
We introduce BTSeg (Barlow Twins regularized Segmentation), an innovative, semi-supervised training approach enhancing semantic segmentation models in order to effectively tackle adverse weather conditions without requiring additional labeled training data. Images captured at similar locations but under varying adverse conditions are regarded as manifold representation of the same scene, thereby enabling the model to conceptualize its understanding of the environment. BTSeg shows cutting-edge performance for the new challenging ACG benchmark and sets a new state-of-the-art for weakly-supervised domain adaptation for the ACDC dataset. To support further research, we have made our code publicly available at https://github.com/fraunhoferhhi/BTSeg .
CVMar 6, 2023
System for 3D Acquisition and 3D Reconstruction using Structured Light for Sewer Line InspectionJohannes Künzel, Darko Vehar, Rico Nestler et al.
The assessment of sewer pipe systems is a highly important, but at the same time cumbersome and error-prone task. We introduce an innovative system based on single-shot structured light modules that facilitates the detection and classification of spatial defects like jutting intrusions, spallings, or misaligned joints. This system creates highly accurate 3D measurements with sub-millimeter resolution of pipe surfaces and fuses them into a holistic 3D model. The benefit of such a holistic 3D model is twofold: on the one hand, it facilitates the accurate manual sewer pipe assessment, on the other, it simplifies the detection of defects in downstream automatic systems as it endows the input with highly accurate depth information. In this work, we provide an extensive overview of the system and give valuable insights into our design choices.
CVFeb 26
Phys-3D: Physics-Constrained Real-Time Crowd Tracking and Counting on Railway PlatformsBin Zeng, Johannes Künzel, Anna Hilsmann et al.
Accurate, real-time crowd counting on railway platforms is essential for safety and capacity management. We propose to use a single camera mounted in a train, scanning the platform while arriving. While hardware constraints are simple, counting remains challenging due to dense occlusions, camera motion, and perspective distortions during train arrivals. Most existing tracking-by-detection approaches assume static cameras or ignore physical consistency in motion modeling, leading to unreliable counting under dynamic conditions. We propose a physics-constrained tracking framework that unifies detection, appearance, and 3D motion reasoning in a real-time pipeline. Our approach integrates a transfer-learned YOLOv11m detector with EfficientNet-B0 appearance encoding within DeepSORT, while introducing a physics-constrained Kalman model (Phys-3D) that enforces physically plausible 3D motion dynamics through pinhole geometry. To address counting brittleness under occlusions, we implement a virtual counting band with persistence. On our platform benchmark, MOT-RailwayPlatformCrowdHead Dataset(MOT-RPCH), our method reduces counting error to 2.97%, demonstrating robust performance despite motion and occlusions. Our results show that incorporating first-principles geometry and motion priors enables reliable crowd counting in safety-critical transportation scenarios, facilitating effective train scheduling and platform safety management.
CVJul 7, 2025Code
RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint ExtractionJohannes Künzel, Anna Hilsmann, Peter Eisert
We introduce RIPE, an innovative reinforcement learning-based framework for weakly-supervised training of a keypoint extractor that excels in both detection and description tasks. In contrast to conventional training regimes that depend heavily on artificial transformations, pre-generated models, or 3D data, RIPE requires only a binary label indicating whether paired images represent the same scene. This minimal supervision significantly expands the pool of training data, enabling the creation of a highly generalized and robust keypoint extractor. RIPE utilizes the encoder's intermediate layers for the description of the keypoints with a hyper-column approach to integrate information from different scales. Additionally, we propose an auxiliary loss to enhance the discriminative capability of the learned descriptors. Comprehensive evaluations on standard benchmarks demonstrate that RIPE simplifies data preparation while achieving competitive performance compared to state-of-the-art techniques, marking a significant advancement in robust keypoint extraction and description. To support further research, we have made our code publicly available at https://github.com/fraunhoferhhi/RIPE.
CVFeb 1, 2022
From Explanations to Segmentation: Using Explainable AI for Image SegmentationClemens Seibold, Johannes Künzel, Anna Hilsmann et al.
The new era of image segmentation leveraging the power of Deep Neural Nets (DNNs) comes with a price tag: to train a neural network for pixel-wise segmentation, a large amount of training samples has to be manually labeled on pixel-precision. In this work, we address this by following an indirect solution. We build upon the advances of the Explainable AI (XAI) community and extract a pixel-wise binary segmentation from the output of the Layer-wise Relevance Propagation (LRP) explaining the decision of a classification network. We show that we achieve similar results compared to an established U-Net segmentation architecture, while the generation of the training data is significantly simplified. The proposed method can be trained in a weakly supervised fashion, as the training samples must be only labeled on image-level, at the same time enabling the output of a segmentation mask. This makes it especially applicable to a wider range of real applications where tedious pixel-level labelling is often not possible.
CVDec 11, 2019
Automatic Analysis of Sewer Pipes Based on Unrolled Monocular Fisheye ImagesJohannes Künzel, Thomas Werner, Ronja Möller et al.
The task of detecting and classifying damages in sewer pipes offers an important application area for computer vision algorithms. This paper describes a system, which is capable of accomplishing this task solely based on low quality and severely compressed fisheye images from a pipe inspection robot. Relying on robust image features, we estimate camera poses, model the image lighting, and exploit this information to generate high quality cylindrical unwraps of the pipes' surfaces.Based on the generated images, we apply semantic labeling based on deep convolutional neural networks to detect and classify defects as well as structural elements.