Stefan Becker

CV
h-index30
21papers
185citations
Novelty40%
AI Score46

21 Papers

CVMar 3, 2022
Revisiting Click-based Interactive Video Object Segmentation

Stephane Vujasinovic, Sebastian Bullinger, Stefan Becker et al.

While current methods for interactive Video Object Segmentation (iVOS) rely on scribble-based interactions to generate precise object masks, we propose a Click-based interactive Video Object Segmentation (CiVOS) framework to simplify the required user workload as much as possible. CiVOS builds on de-coupled modules reflecting user interaction and mask propagation. The interaction module converts click-based interactions into an object mask, which is then inferred to the remaining frames by the propagation module. Additional user interactions allow for a refinement of the object mask. The approach is extensively evaluated on the popular interactive~DAVIS dataset, but with an inevitable adaptation of scribble-based interactions with click-based counterparts. We consider several strategies for generating clicks during our evaluation to reflect various user inputs and adjust the DAVIS performance metric to perform a hardware-independent comparison. The presented CiVOS pipeline achieves competitive results, although requiring a lower user workload.

CVJul 31, 2024Code
Strike the Balance: On-the-Fly Uncertainty based User Interactions for Long-Term Video Object Segmentation

Stéphane Vujasinović, Stefan Becker, Sebastian Bullinger et al.

In this paper, we introduce a variant of video object segmentation (VOS) that bridges interactive and semi-automatic approaches, termed Lazy Video Object Segmentation (ziVOS). In contrast, to both tasks, which handle video object segmentation in an off-line manner (i.e., pre-recorded sequences), we propose through ziVOS to target online recorded sequences. Here, we strive to strike a balance between performance and robustness for long-term scenarios by soliciting user feedback's on-the-fly during the segmentation process. Hence, we aim to maximize the tracking duration of an object of interest, while requiring minimal user corrections to maintain tracking over an extended period. We propose a competitive baseline, i.e., Lazy-XMem, as a reference for future works in ziVOS. Our proposed approach uses an uncertainty estimation of the tracking state to determine whether a user interaction is necessary to refine the model's prediction. To quantitatively assess the performance of our method and the user's workload, we introduce complementary metrics alongside those already established in the field. We evaluate our approach using the recently introduced LVOS dataset, which offers numerous long-term videos. Our code is publicly available at https://github.com/Vujas-Eteph/LazyXMem.

CVJan 8Code
Higher-Order Adversarial Patches for Real-Time Object Detectors

Jens Bayer, Stefan Becker, David Münch et al.

Higher-order adversarial attacks can directly be considered the result of a cat-and-mouse game -- an elaborate action involving constant pursuit, near captures, and repeated escapes. This idiom describes the enduring circular training of adversarial attack patterns and adversarial training the best. The following work investigates the impact of higher-order adversarial attacks on object detectors by successively training attack patterns and hardening object detectors with adversarial training. The YOLOv10 object detector is chosen as a representative, and adversarial patches are used in an evasion attack manner. Our results indicate that higher-order adversarial patches are not only affecting the object detector directly trained on but rather provide a stronger generalization capacity compared to lower-order adversarial patches. Moreover, the results highlight that solely adversarial training is not sufficient to harden an object detector efficiently against this kind of adversarial attack. Code: https://github.com/JensBayer/HigherOrder

MLMay 3, 2022
Bézier Curve Gaussian Processes

Ronny Hug, Stefan Becker, Wolfgang Hübner et al.

Probabilistic models for sequential data are the basis for a variety of applications concerned with processing timely ordered information. The predominant approach in this domain is given by recurrent neural networks, implementing either an approximate Bayesian approach (e.g. Variational Autoencoders or Generative Adversarial Networks) or a regression-based approach, i.e. variations of Mixture Density networks (MDN). In this paper, we focus on the $\mathcal{N}$-MDN variant, which parameterizes (mixtures of) probabilistic Bézier curves ($\mathcal{N}$-Curves) for modeling stochastic processes. While in favor in terms of computational cost and stability, MDNs generally fall behind approximate Bayesian approaches in terms of expressiveness. Towards this end, we present an approach for closing this gap by enabling full Bayesian inference on top of $\mathcal{N}$-MDNs. For this, we show that $\mathcal{N}$-Curves are a special case of Gaussian processes (denoted as $\mathcal{N}$-GP) and then derive corresponding mean and kernel functions for different modalities. Following this, we propose the use of the $\mathcal{N}$-MDN as a data-dependent generator for $\mathcal{N}$-GP prior distributions. We show the advantages granted by this combined model in an application context, using human trajectory prediction as an example.

CVJun 19, 2023
Eigenpatches -- Adversarial Patches from Principal Components

Jens Bayer, Stefan Becker, David Münch et al.

Adversarial patches are still a simple yet powerful white box attack that can be used to fool object detectors by suppressing possible detections. The patches of these so-called evasion attacks are computational expensive to produce and require full access to the attacked detector. This paper addresses the problem of computational expensiveness by analyzing 375 generated patches, calculating the principal components of these and show, that linear combinations of the resulting "eigenpatches" can be used to fool object detections successfully.

CVNov 16, 2023
Utilizing dataset affinity prediction in object detection to assess training data

Stefan Becker, Jens Bayer, Ronny Hug et al.

Data pooling offers various advantages, such as increasing the sample size, improving generalization, reducing sampling bias, and addressing data sparsity and quality, but it is not straightforward and may even be counterproductive. Assessing the effectiveness of pooling datasets in a principled manner is challenging due to the difficulty in estimating the overall information content of individual datasets. Towards this end, we propose incorporating a data source prediction module into standard object detection pipelines. The module runs with minimal overhead during inference time, providing additional information about the data source assigned to individual detections. We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets. The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy.

CVAug 28, 2024
Network transferability of adversarial patches in real-time object detection

Jens Bayer, Stefan Becker, David Münch et al.

Adversarial patches in computer vision can be used, to fool deep neural networks and manipulate their decision-making process. One of the most prominent examples of adversarial patches are evasion attacks for object detectors. By covering parts of objects of interest, these patches suppress the detections and thus make the target object 'invisible' to the object detector. Since these patches are usually optimized on a specific network with a specific train dataset, the transferability across multiple networks and datasets is not given. This paper addresses these issues and investigates the transferability across numerous object detector architectures. Our extensive evaluation across various models on two distinct datasets indicates that patches optimized with larger models provide better network transferability than patches that are optimized with smaller models.

CVMay 15
A Causally Grounded Taxonomy for Image Degradation Robustness Evaluation

Stefan Becker, Simon Weiss, Wolfgang Hübner et al.

Image degradations can occur during acquisition, processing, and transmission, altering visual appearance and affecting downstream vision tasks. They are studied in several communities, including synthetic corruption benchmarks for robustness evaluation, perceptual image quality assessment, and physically grounded analyses of imaging systems or real camera failures. Although these areas address closely related phenomena, they often use incompatible grouping schemes and backend specific severity definitions, making results difficult to compare across datasets, degradation sources, and tasks. We propose a causally grounded framework for organizing and interpreting image degradations across these settings. Instead of introducing new degradations or redefining existing benchmarks, we provide an interpretive representation and measurement layer that makes implicit assumptions explicit. Each degradation is described along two orthogonal axes: its dominant causal source in the imaging pipeline (environment, sensor/optics, ISP/renderer/codec, or transfer/system), and its resulting perceptual effect. This dual axis abstraction yields a compact taxonomy spanning algorithmic corruptions, perceptual distortions, and physically motivated imaging artifacts. To address inconsistent severity semantics without changing existing implementations, we introduce a lightweight severity measurement layer. For every degradation and each native severity level of a given backend, we quantify degradation strength using full reference image quality metrics: PSNR, SSIM, and LPIPS. This makes severity observable and comparable across sources while preserving native parameterizations. We demonstrate the framework through COCO Degradation, a taxonomy aligned benchmark for evaluating object detector robustness under diverse imaging conditions.

CVDec 2, 2024
Traversing the Subspace of Adversarial Patches

Jens Bayer, Stefan Becker, David Münch et al.

Despite ongoing research on the topic of adversarial examples in deep learning for computer vision, some fundamentals of the nature of these attacks remain unclear. As the manifold hypothesis posits, high-dimensional data tends to be part of a low-dimensional manifold. To verify the thesis with adversarial patches, this paper provides an analysis of a set of adversarial patches and investigates the reconstruction abilities of three different dimensionality reduction methods. Quantitatively, the performance of reconstructed patches in an attack setting is measured and the impact of sampled patches from the latent space during adversarial training is investigated. The evaluation is performed on two publicly available datasets for person detection. The results indicate that more sophisticated dimensionality reduction methods offer no advantages over a simple principal component analysis.

CVFeb 20
Self-Aware Object Detection via Degradation Manifolds

Stefan Becker, Simon Weiss, Wolfgang Hübner et al.

Object detectors achieve strong performance under nominal imaging conditions but can fail silently when exposed to blur, noise, compression, adverse weather, or resolution changes. In safety-critical settings, it is therefore insufficient to produce predictions without assessing whether the input remains within the detector's nominal operating regime. We refer to this capability as self-aware object detection. We introduce a degradation-aware self-awareness framework based on degradation manifolds, which explicitly structure a detector's feature space according to image degradation rather than semantic content. Our method augments a standard detection backbone with a lightweight embedding head trained via multi-layer contrastive learning. Images sharing the same degradation composition are pulled together, while differing degradation configurations are pushed apart, yielding a geometrically organized representation that captures degradation type and severity without requiring degradation labels or explicit density modeling. To anchor the learned geometry, we estimate a pristine prototype from clean training embeddings, defining a nominal operating point in representation space. Self-awareness emerges as geometric deviation from this reference, providing an intrinsic, image-level signal of degradation-induced shift that is independent of detection confidence. Extensive experiments on synthetic corruption benchmarks, cross-dataset zero-shot transfer, and natural weather-induced distribution shifts demonstrate strong pristine-degraded separability, consistent behavior across multiple detector architectures, and robust generalization under semantic shift. These results suggest that degradation-aware representation geometry provides a practical and detector-agnostic foundation.

LGApr 5, 2024
Generating Synthetic Ground Truth Distributions for Multi-step Trajectory Prediction using Probabilistic Composite Bézier Curves

Ronny Hug, Stefan Becker, Wolfgang Hübner et al.

An appropriate data basis grants one of the most important aspects for training and evaluating probabilistic trajectory prediction models based on neural networks. In this regard, a common shortcoming of current benchmark datasets is their limitation to sets of sample trajectories and a lack of actual ground truth distributions, which prevents the use of more expressive error metrics, such as the Wasserstein distance for model evaluation. Towards this end, this paper proposes a novel approach to synthetic dataset generation based on composite probabilistic Bézier curves, which is capable of generating ground truth data in terms of probability distributions over full trajectories. This allows the calculation of arbitrary posterior distributions. The paper showcases an exemplary trajectory prediction model evaluation using generated ground truth distribution data.

CVMay 22, 2023
READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation

Stéphane Vujasinović, Sebastian Bullinger, Stefan Becker et al.

We present READMem (Robust Embedding Association for a Diverse Memory), a modular framework for semi-automatic video object segmentation (sVOS) methods designed to handle unconstrained videos. Contemporary sVOS works typically aggregate video frames in an ever-expanding memory, demanding high hardware resources for long-term applications. To mitigate memory requirements and prevent near object duplicates (caused by information of adjacent frames), previous methods introduce a hyper-parameter that controls the frequency of frames eligible to be stored. This parameter has to be adjusted according to concrete video properties (such as rapidity of appearance changes and video length) and does not generalize well. Instead, we integrate the embedding of a new frame into the memory only if it increases the diversity of the memory content. Furthermore, we propose a robust association of the embeddings stored in the memory with query embeddings during the update process. Our approach avoids the accumulation of redundant data, allowing us in return, to restrict the memory size and prevent extreme memory demands in long videos. We extend popular sVOS baselines with READMem, which previously showed limited performance on long videos. Our approach achieves competitive results on the Long-time Video dataset (LV1) while not hindering performance on short sequences. Our code is publicly available.

CVJul 1, 2021
Generating Synthetic Training Data for Deep Learning-Based UAV Trajectory Prediction

Stefan Becker, Ronny Hug, Wolfgang Hübner et al.

Deep learning-based models, such as recurrent neural networks (RNNs), have been applied to various sequence learning tasks with great success. Following this, these models are increasingly replacing classic approaches in object tracking applications for motion prediction. On the one hand, these models can capture complex object dynamics with less modeling required, but on the other hand, they depend on a large amount of training data for parameter tuning. Towards this end, we present an approach for generating synthetic trajectory data of unmanned-aerial-vehicles (UAVs) in image space. Since UAVs, or rather quadrotors are dynamical systems, they can not follow arbitrary trajectories. With the prerequisite that UAV trajectories fulfill a smoothness criterion corresponding to a minimal change of higher-order motion, methods for planning aggressive quadrotors flights can be utilized to generate optimal trajectories through a sequence of 3D waypoints. By projecting these maneuver trajectories, which are suitable for controlling quadrotors, to image space, a versatile trajectory data set is realized. To demonstrate the applicability of the synthetic trajectory data, we show that an RNN-based prediction model solely trained on the generated data can outperform classic reference models on a real-world UAV tracking dataset. The evaluation is done on the publicly available ANTI-UAV dataset.

CVJun 30, 2021
MissFormer: (In-)attention-based handling of missing observations for trajectory filtering and prediction

Stefan Becker, Ronny Hug, Wolfgang Hübner et al.

In applications such as object tracking, time-series data inevitably carry missing observations. Following the success of deep learning-based models for various sequence learning tasks, these models increasingly replace classic approaches in object tracking applications for inferring the objects' motion states. While traditional tracking approaches can deal with missing observations, most of their deep counterparts are, by default, not suited for this. Towards this end, this paper introduces a transformer-based approach for handling missing observations in variable input length trajectory data. The model is formed indirectly by successively increasing the complexity of the demanded inference tasks. Starting from reproducing noise-free trajectories, the model then learns to infer trajectories from noisy inputs. By providing missing tokens, binary-encoded missing events, the model learns to in-attend to missing data and infers a complete trajectory conditioned on the remaining inputs. In the case of a sequence of successive missing events, the model then acts as a pure prediction model. The abilities of the approach are demonstrated on synthetic data and real-world data reflecting prototypical object tracking scenarios.

CVMar 22, 2021
Handling Missing Observations with an RNN-based Prediction-Update Cycle

Stefan Becker, Ronny Hug, Wolfgang Hübner et al.

In tasks such as tracking, time-series data inevitably carry missing observations. While traditional tracking approaches can handle missing observations, recurrent neural networks (RNNs) are designed to receive input data in every step. Furthermore, current solutions for RNNs, like omitting the missing data or data imputation, are not sufficient to account for the resulting increased uncertainty. Towards this end, this paper introduces an RNN-based approach that provides a full temporal filtering cycle for motion state estimation. The Kalman filter inspired approach, enables to deal with missing observations and outliers. For providing a full temporal filtering cycle, a basic RNN is extended to take observations and the associated belief about its accuracy into account for updating the current state. An RNN prediction model, which generates a parametrized distribution to capture the predicted states, is combined with an RNN update model, which relies on the prediction model output and the current observation. By providing the model with masking information, binary-encoded missing events, the model can overcome limitations of standard techniques for dealing with missing input values. The model abilities are demonstrated on synthetic data reflecting prototypical pedestrian tracking scenarios.

CVAug 6, 2020
Integration of the 3D Environment for UAV Onboard Visual Object Tracking

Stéphane Vujasinović, Stefan Becker, Timo Breuer et al.

Single visual object tracking from an unmanned aerial vehicle (UAV) poses fundamental challenges such as object occlusion, small-scale objects, background clutter, and abrupt camera motion. To tackle these difficulties, we propose to integrate the 3D structure of the observed scene into a detection-by-tracking algorithm. We introduce a pipeline that combines a model-free visual object tracker, a sparse 3D reconstruction, and a state estimator. The 3D reconstruction of the scene is computed with an image-based Structure-from-Motion (SfM) component that enables us to leverage a state estimator in the corresponding 3D scene during tracking. By representing the position of the target in 3D space rather than in image space, we stabilize the tracking during ego-motion and improve the handling of occlusions, background clutter, and small-scale objects. We evaluated our approach on prototypical image sequences, captured from a UAV with low-altitude oblique views. For this purpose, we adapted an existing dataset for visual object tracking and reconstructed the observed scene in 3D. The experimental results demonstrate that the proposed approach outperforms methods using plain visual cues as well as approaches leveraging image-space-based state estimations. We believe that our approach can be beneficial for traffic monitoring, video surveillance, and navigation.

CVMay 28, 2020
Quantifying the Complexity of Standard Benchmarking Datasets for Long-Term Human Trajectory Prediction

Ronny Hug, Stefan Becker, Wolfgang Hübner et al.

Methods to quantify the complexity of trajectory datasets are still a missing piece in benchmarking human trajectory prediction models. In order to gain a better understanding of the complexity of trajectory prediction tasks and following the intuition, that more complex datasets contain more information, an approach for quantifying the amount of information contained in a dataset from a prototype-based dataset representation is proposed. The dataset representation is obtained by first employing a non-trivial spatial sequence alignment, which enables a subsequent learning vector quantization (LVQ) stage. A large-scale complexity analysis is conducted on several human trajectory prediction benchmarking datasets, followed by a brief discussion on indications for human trajectory prediction and benchmarking.

LGMar 27, 2020
A Short Note on Analyzing Sequence Complexity in Trajectory Prediction Benchmarks

Ronny Hug, Stefan Becker, Wolfgang Hübner et al.

The analysis and quantification of sequence complexity is an open problem frequently encountered when defining trajectory prediction benchmarks. In order to enable a more informative assembly of a data basis, an approach for determining a dataset representation in terms of a small set of distinguishable prototypical sub-sequences is proposed. The approach employs a sequence alignment followed by a learning vector quantization (LVQ) stage. A first proof of concept on synthetically generated and real-world datasets shows the viability of the approach.

CVFeb 5, 2019
An RNN-based IMM Filter Surrogate

Stefan Becker, Ronny Hug, Wolfgang Hübner et al.

The problem of varying dynamics of tracked objects, such as pedestrians, is traditionally tackled with approaches like the Interacting Multiple Model (IMM) filter using a Bayesian formulation. By following the current trend towards using deep neural networks, in this paper an RNN-based IMM filter surrogate is presented. Similar to an IMM filter solution, the presented RNN-based model assigns a probability value to a performed dynamic and, based on them, puts out a multi-modal distribution over future pedestrian trajectories. The evaluation is done on synthetic data, reflecting prototypical pedestrian maneuvers.

CVMay 19, 2018
An Evaluation of Trajectory Prediction Approaches and Notes on the TrajNet Benchmark

Stefan Becker, Ronny Hug, Wolfgang Hübner et al.

In recent years, there is a shift from modeling the tracking problem based on Bayesian formulation towards using deep neural networks. Towards this end, in this paper the effectiveness of various deep neural networks for predicting future pedestrian paths are evaluated. The analyzed deep networks solely rely, like in the traditional approaches, on observed tracklets without human-human interaction information. The evaluation is done on the publicly available TrajNet benchmark dataset, which builds up a repository of considerable and popular datasets for trajectory-based activity forecasting. We show that a Recurrent-Encoder with a Dense layer stacked on top, referred to as RED-predictor, is able to achieve sophisticated results compared to elaborated models in such scenarios. Further, we investigate failure cases and give explanations for observed phenomena and give some recommendations for overcoming demonstrated shortcomings.

CVApr 16, 2018
Particle-based pedestrian path prediction using LSTM-MDL models

Ronny Hug, Stefan Becker, Wolfgang Hübner et al.

Recurrent neural networks are able to learn complex long-term relationships from sequential data and output a pdf over the state space. Therefore, recurrent models are a natural choice to address path prediction tasks, where a trained model is used to generate future expectations from past observations. When applied to security applications, like predicting the path of pedestrians for risk assessment, a point-wise greedy (ML) evaluation of the output pdf is not feasible, since the environment often allows multiple choices. Therefore, a robust risk assessment has to take all options into account, even if they are overall not very likely. Towards this end, a combination of particle filter sampling strategies and a LSTM-MDL model is proposed to address a multi-modal path prediction task. The capabilities and viability of the proposed approach are evaluated on several synthetic test conditions, yielding the counter-intuitive result that the simplest approach performs best. Further, the feasibility of the proposed approach is illustrated on several real world scenes.