49.7CVMay 26Code
Self-Intersection-Aware 3D Human Motion Generation Using an Efficient Human Sphere ProxyPascal Herrmann, Maarten Bieshaar, Dennis Mack et al.
Human motion generation has made tremendous progress in recent years, with state-of-the-art approaches surpassing ground truth data in leading evaluation benchmarks. However, visual inspection of the generated motions paints a different picture. Even state-of-the-art approaches generate motions frequently containing self-intersections, i.e., body parts interpenetrating, which are strong artifacts, severely limiting the perceived motion quality. We introduce a novel loss, which explicitly penalizes self-intersections, to the training of human motion generation methods. We base our loss on a sphere proxy of human geometry, which allows us to calculate a self-intersection loss 98% faster and uses 83% less memory than comparable methods based on triangular meshes. The loss is agnostic to the specific approach, and we add it to the training of the recent human motion generation methods human motion diffusion model (MDM) and MoMask. Our extensive experiments show a reduction of self-intersections in generated motions of up to 49% while improving other evaluation metrics. The code is available at https://github.com/boschresearch/humansphereproxy .
ROOct 17, 2022
Space, Time, and Interaction: A Taxonomy of Corner Cases in Trajectory Datasets for Automated DrivingKevin Rösch, Florian Heidecker, Julian Truetsch et al.
Trajectory data analysis is an essential component for highly automated driving. Complex models developed with these data predict other road users' movement and behavior patterns. Based on these predictions - and additional contextual information such as the course of the road, (traffic) rules, and interaction with other road users - the highly automated vehicle (HAV) must be able to reliably and safely perform the task assigned to it, e.g., moving from point A to B. Ideally, the HAV moves safely through its environment, just as we would expect a human driver to do. However, if unusual trajectories occur, so-called trajectory corner cases, a human driver can usually cope well, but an HAV can quickly get into trouble. In the definition of trajectory corner cases, which we provide in this work, we will consider the relevance of unusual trajectories with respect to the task at hand. Based on this, we will also present a taxonomy of different trajectory corner cases. The categorization of corner cases into the taxonomy will be shown with examples and is done by cause and required data sources. To illustrate the complexity between the machine learning (ML) model and the corner case cause, we present a general processing chain underlying the taxonomy.
CVAug 19, 2025Code
RICO: Two Realistic Benchmarks and an In-Depth Analysis for Incremental Learning in Object DetectionMatthias Neuwirth-Trapp, Maarten Bieshaar, Danda Pani Paudel et al.
Incremental Learning (IL) trains models sequentially on new data without full retraining, offering privacy, efficiency, and scalability. IL must balance adaptability to new data with retention of old knowledge. However, evaluations often rely on synthetic, simplified benchmarks, obscuring real-world IL performance. To address this, we introduce two Realistic Incremental Object Detection Benchmarks (RICO): Domain RICO (D-RICO) features domain shifts with a fixed class set, and Expanding-Classes RICO (EC-RICO) integrates new domains and classes per IL step. Built from 14 diverse datasets covering real and synthetic domains, varying conditions (e.g., weather, time of day), camera sensors, perspectives, and labeling policies, both benchmarks capture challenges absent in existing evaluations. Our experiments show that all IL methods underperform in adaptability and retention, while replaying a small amount of previous data already outperforms all methods. However, individual training on the data remains superior. We heuristically attribute this gap to weak teachers in distillation, single models' inability to manage diverse tasks, and insufficient plasticity. Our code will be made publicly available.
56.9CVMay 12
MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image GenerationSonali Godavarthy, Matthias Neuwirth-Trapp, Tim-Felix Faasch et al.
Recent text-to-image models produce high-quality images, yet text ambiguity hinders precise control when specific styles or objects are required. There have been a number of recent works dealing with learning and composing multiple objects and patterns. However, current work focuses almost entirely on image content, overlooking imaging factors such as camera lens, sensor types, imaging viewpoints, and scenes' domain characteristics. We introduce this new challenge as Imaging Factor Disentanglement and show limitations of current approaches in the regime. We, therefore, propose the new method Multi-factor disentanglement through Textual Inversion (MULTI). It consists of two stages: in the first stage, we learn general factors, and in the second stage, we extract dataset-specific ones. This setup enables the extension of existing datasets and novel factor combinations, thereby reducing distribution gaps. It further supports modifications of specific factors and image-to-image generation via ControlNets. The evaluation on our new DF-RICO benchmark demonstrates the effectiveness of MULTI and highlights the importance of Factor Disentanglement as a new direction of research.
LGFeb 5, 2024
A Safety-Adapted Loss for Pedestrian Detection in Automated DrivingMaria Lyssenko, Piyush Pimplikar, Maarten Bieshaar et al.
In safety-critical domains like automated driving (AD), errors by the object detector may endanger pedestrians and other vulnerable road users (VRU). As common evaluation metrics are not an adequate safety indicator, recent works employ approaches to identify safety-critical VRU and back-annotate the risk to the object detector. However, those approaches do not consider the safety factor in the deep neural network (DNN) training process. Thus, state-of-the-art DNN penalizes all misdetections equally irrespective of their criticality. Subsequently, to mitigate the occurrence of critical failure cases, i.e., false negatives, a safety-aware training strategy might be required to enhance the detection performance for critical pedestrians. In this paper, we propose a novel safety-aware loss variation that leverages the estimated per-pedestrian criticality scores during training. We exploit the reachability set-based time-to-collision (TTC-RSB) metric from the motion domain along with distance information to account for the worst-case threat quantifying the criticality. Our evaluation results using RetinaNet and FCOS on the nuScenes dataset demonstrate that training the models with our safety-aware loss function mitigates the misdetection of critical pedestrians without sacrificing performance for the general case, i.e., pedestrians outside the safety-critical zone.
CVAug 20, 2025
Incremental Object Detection with Prompt-based MethodsMatthias Neuwirth-Trapp, Maarten Bieshaar, Danda Pani Paudel et al.
Visual prompt-based methods have seen growing interest in incremental learning (IL) for image classification. These approaches learn additional embedding vectors while keeping the model frozen, making them efficient to train. However, no prior work has applied such methods to incremental object detection (IOD), leaving their generalizability unclear. In this paper, we analyze three different prompt-based methods under a complex domain-incremental learning setting. We additionally provide a wide range of reference baselines for comparison. Empirically, we show that the prompt-based approaches we tested underperform in this setting. However, a strong yet practical method, combining visual prompts with replaying a small portion of previous data, achieves the best results. Together with additional experiments on prompt length and initialization, our findings offer valuable insights for advancing prompt-based IL in IOD.
CVApr 17, 2024
Criteria for Uncertainty-based Corner Cases Detection in Instance SegmentationFlorian Heidecker, Ahmad El-Khateeb, Maarten Bieshaar et al.
The operating environment of a highly automated vehicle is subject to change, e.g., weather, illumination, or the scenario containing different objects and other participants in which the highly automated vehicle has to navigate its passengers safely. These situations must be considered when developing and validating highly automated driving functions. This already poses a problem for training and evaluating deep learning models because without the costly labeling of thousands of recordings, not knowing whether the data contains relevant, interesting data for further model training, it is a guess under which conditions and situations the model performs poorly. For this purpose, we present corner case criteria based on the predictive uncertainty. With our corner case criteria, we are able to detect uncertainty-based corner cases of an object instance segmentation model without relying on ground truth (GT) data. We evaluated each corner case criterion using the COCO and the NuImages dataset to analyze the potential of our approach. We also provide a corner case decision function that allows us to distinguish each object into True Positive (TP), localization and/or classification corner case, or False Positive (FP). We also present our first results of an iterative training cycle that outperforms the baseline and where the data added to the training dataset is selected based on the corner case decision function.
LGSep 20, 2021
Description of Corner Cases in Automated Driving: Goals and ChallengesDaniel Bogdoll, Jasmin Breitenstein, Florian Heidecker et al.
Scaling the distribution of automated vehicles requires handling various unexpected and possibly dangerous situations, termed corner cases (CC). Since many modules of automated driving systems are based on machine learning (ML), CC are an essential part of the data for their development. However, there is only a limited amount of CC data in large-scale data collections, which makes them challenging in the context of ML. With a better understanding of CC, offline applications, e.g., dataset analysis, and online methods, e.g., improved performance of automated driving systems, can be improved. While there are knowledge-based descriptions and taxonomies for CC, there is little research on machine-interpretable descriptions. In this extended abstract, we will give a brief overview of the challenges and goals of such a description.
LGMay 4, 2021
Out-of-distribution Detection and Generation using Soft Brownian Offset Sampling and AutoencodersFelix Möller, Diego Botache, Denis Huseljic et al.
Deep neural networks often suffer from overconfidence which can be partly remedied by improved out-of-distribution detection. For this purpose, we propose a novel approach that allows for the generation of out-of-distribution datasets based on a given in-distribution dataset. This new dataset can then be used to improve out-of-distribution detection for the given dataset and machine learning task at hand. The samples in this dataset are with respect to the feature space close to the in-distribution dataset and therefore realistic and plausible. Hence, this dataset can also be used to safeguard neural networks, i.e., to validate the generalization performance. Our approach first generates suitable representations of an in-distribution dataset using an autoencoder and then transforms them using our novel proposed Soft Brownian Offset method. After transformation, the decoder part of the autoencoder allows for the generation of these implicit out-of-distribution samples. This newly generated dataset then allows for mixing with other datasets and thus improved training of an out-of-distribution classifier, increasing its performance. Experimentally, we show that our approach is promising for time series using synthetic data. Using our new method, we also show in a quantitative case study that we can improve the out-of-distribution detection for the MNIST dataset. Finally, we provide another case study on the synthetic generation of out-of-distribution trajectories, which can be used to validate trajectory prediction algorithms for automated driving.
CVMar 5, 2021
An Application-Driven Conceptualization of Corner Cases for Perception in Highly Automated DrivingFlorian Heidecker, Jasmin Breitenstein, Kevin Rösch et al.
Systems and functions that rely on machine learning (ML) are the basis of highly automated driving. An essential task of such ML models is to reliably detect and interpret unusual, new, and potentially dangerous situations. The detection of those situations, which we refer to as corner cases, is highly relevant for successfully developing, applying, and validating automotive perception functions in future vehicles where multiple sensor modalities will be used. A complication for the development of corner case detectors is the lack of consistent definitions, terms, and corner case descriptions, especially when taking into account various automotive sensors. In this work, we provide an application-driven view of corner cases in highly automated driving. To achieve this goal, we first consider existing definitions from the general outlier, novelty, anomaly, and out-of-distribution detection to show relations and differences to corner cases. Moreover, we extend an existing camera-focused systematization of corner cases by adding RADAR (radio detection and ranging) and LiDAR (light detection and ranging) sensors. For this, we describe an exemplary toolchain for data acquisition and processing, highlighting the interfaces of the corner case detection. We also define a novel level of corner cases, the method layer corner cases, which appear due to uncertainty inherent in the methodology or the data distribution.
APSep 29, 2020
Quantile Surfaces -- Generalizing Quantile Regression to Multivariate TargetsMaarten Bieshaar, Jens Schreiber, Stephan Vogt et al.
In this article, we present a novel approach to multivariate probabilistic forecasting. Our approach is based on an extension of single-output quantile regression (QR) to multivariate-targets, called quantile surfaces (QS). QS uses a simple yet compelling idea of indexing observations of a probabilistic forecast through direction and vector length to estimate a central tendency. We extend the single-output QR technique to multivariate probabilistic targets. QS efficiently models dependencies in multivariate target variables and represents probability distributions through discrete quantile levels. Therefore, we present a novel two-stage process. In the first stage, we perform a deterministic point forecast (i.e., central tendency estimation). Subsequently, we model the prediction uncertainty using QS involving neural networks called quantile surface regression neural networks (QSNN). Additionally, we introduce new methods for efficient and straightforward evaluation of the reliability and sharpness of the issued probabilistic QS predictions. We complement this by the directional extension of the Continuous Ranked Probability Score (CRPS) score. Finally, we evaluate our novel approach on synthetic data and two currently researched real-world challenges in two different domains: First, probabilistic forecasting for renewable energy power generation, second, short-term cyclists trajectory forecasting for autonomously driving vehicles. Especially for the latter, our empirical results show that even a simple one-layer QSNN outperforms traditional parametric multivariate forecasting techniques, thus improving the state-of-the-art performance.
LGApr 29, 2020
Extended Coopetitive Soft Gating EnsembleStephan Deist, Jens Schreiber, Maarten Bieshaar et al.
This article is about an extension of a recent ensemble method called Coopetitive Soft Gating Ensemble (CSGE) and its application on power forecasting as well as motion primitive forecasting of cyclists. The CSGE has been used successfully in the field of wind power forecasting, outperforming common algorithms in this domain. The principal idea of the CSGE is to weight the models regarding their observed performance during training on different aspects. Several extensions are proposed to the original CSGE within this article, making the ensemble even more flexible and powerful. The extended CSGE (XCSGE as we term it), is used to predict the power generation on both wind- and solar farms. Moreover, the XCSGE is applied to forecast the movement state of cyclists in the context of driver assistance systems. Both domains have different requirements, are non-trivial problems, and are used to evaluate various facets of the novel XCSGE. The two problems differ fundamentally in the size of the data sets and the number of features. Power forecasting is based on weather forecasts that are subject to fluctuations in their features. In the movement primitive forecasting of cyclists, time delays contribute to the difficulty of the prediction. The XCSGE reaches an improvement of the prediction performance of up to 11% for wind power forecasting and 30% for solar power forecasting compared to the worst performing model. For the classification of movement primitives of cyclists, the XCSGE reaches an improvement of up to 28%. The evaluation includes a comparison with other state-of-the-art ensemble methods. We can verify that the XCSGE results are significantly better using the Nemenyi post-hoc test.
AIJan 14, 2020
Knowledge Representations in Technical Systems -- A TaxonomyKristina Scharei, Florian Heidecker, Maarten Bieshaar
The recent usage of technical systems in human-centric environments leads to the question, how to teach technical systems, e.g., robots, to understand, learn, and perform tasks desired by the human. Therefore, an accurate representation of knowledge is essential for the system to work as expected. This article mainly gives insight into different knowledge representation techniques and their categorization into various problem domains in artificial intelligence. Additionally, applications of presented knowledge representations are introduced in everyday robotics tasks. By means of the provided taxonomy, the search for a proper knowledge representation technique regarding a specific problem should be facilitated.
AIJan 13, 2020
Multi-Sensor Data and Knowledge Fusion -- A Proposal for a Terminology DefinitionSilvia Beddar-Wiesing, Maarten Bieshaar
Fusion is a common tool for the analysis and utilization of available datasets and so an essential part of data mining and machine learning processes. However, a clear definition of the type of fusion is not always provided due to inconsistent literature. In the following, the process of fusion is defined depending on the fusion components and the abstraction level on which the fusion occurs. The focus in the first part of the paper at hand is on the clear definition of the terminology and the development of an appropriate ontology of the fusion components and the fusion level. In the second part, common fusion techniques are presented.
AISep 11, 2018
Detecting Intentions of Vulnerable Road Users Based on Collective IntelligenceMaarten Bieshaar, Günther Reitberger, Stefan Zernetsch et al.
Vulnerable road users (VRUs, i.e. cyclists and pedestrians) will play an important role in future traffic. To avoid accidents and achieve a highly efficient traffic flow, it is important to detect VRUs and to predict their intentions. In this article a holistic approach for detecting intentions of VRUs by cooperative methods is presented. The intention detection consists of basic movement primitive prediction, e.g. standing, moving, turning, and a forecast of the future trajectory. Vehicles equipped with sensors, data processing systems and communication abilities, referred to as intelligent vehicles, acquire and maintain a local model of their surrounding traffic environment, e.g. crossing cyclists. Heterogeneous, open sets of agents (cooperating and interacting vehicles, infrastructure, e.g. cameras and laser scanners, and VRUs equipped with smart devices and body-worn sensors) exchange information forming a multi-modal sensor system with the goal to reliably and robustly detect VRUs and their intentions under consideration of real time requirements and uncertainties. The resulting model allows to extend the perceptual horizon of the individual agent beyond their own sensory capabilities, enabling a longer forecast horizon. Concealments, implausibilities and inconsistencies are resolved by the collective intelligence of cooperating agents. Novel techniques of signal processing and modelling in combination with analytical and learning based approaches of pattern and activity recognition are used for detection, as well as intention prediction of VRUs. Cooperation, by means of probabilistic sensor and knowledge fusion, takes place on the level of perception and intention recognition. Based on the requirements of the cooperative approach for the communication a new strategy for an ad hoc network is proposed.
CVAug 8, 2018
Smart Device based Initial Movement Detection of Cyclists using Convolutional Neuronal NetworksJan Schneegans, Maarten Bieshaar
For future traffic scenarios, we envision interconnected traffic participants, who exchange information about their current state, e.g., position, their predicted intentions, allowing to act in a cooperative manner. Vulnerable road users (VRUs), e.g., pedestrians and cyclists, will be equipped with smart device that can be used to detect their intentions and transmit these detected intention to approaching cars such that their drivers can be warned. In this article, we focus on detecting the initial movement of cyclist using smart devices. Smart devices provide the necessary sensors, namely accelerometer and gyroscope, and therefore pose an excellent instrument to detect movement transitions (e.g., waiting to moving) fast. Convolutional Neural Networks prove to be the state-of-the-art solution for many problems with an ever increasing range of applications. Therefore, we model the initial movement detection as a classification problem. In terms of Organic Computing (OC) it be seen as a step towards self-awareness and self-adaptation. We apply residual network architectures to the task of detecting the initial starting movement of cyclists.
CVAug 8, 2018
Starting Movement Detection of Cyclists Using Smart DevicesMaarten Bieshaar, Malte Depping, Jan Schneegans et al.
In near future, vulnerable road users (VRUs) such as cyclists and pedestrians will be equipped with smart devices and wearables which are capable to communicate with intelligent vehicles and other traffic participants. Road users are then able to cooperate on different levels, such as in cooperative intention detection for advanced VRU protection. Smart devices can be used to detect intentions, e.g., an occluded cyclist intending to cross the road, to warn vehicles of VRUs, and prevent potential collisions. This article presents a human activity recognition approach to detect the starting movement of cyclists wearing smart devices. We propose a novel two-stage feature selection procedure using a score specialized for robust starting detection reducing the false positive detections and leading to understandable and interpretable features. The detection is modelled as a classification problem and realized by means of a machine learning classifier. We introduce an auxiliary class, that models starting movements and allows to integrate early movement indicators, i.e., body part movements indicating future behaviour. In this way we improve the robustness and reduce the detection time of the classifier. Our empirical studies with real-world data originating from experiments which involve 49 test subjects and consists of 84 starting motions show that we are able to detect the starting movements early. Our approach reaches an F1-score of 67 % within 0.33 s after the first movement of the bicycle wheel. Investigations concerning the device wearing location show that for devices worn in the trouser pocket the detector has less false detections and detects starting movements faster on average. We found that we can further improve the results when we train distinct classifiers for different wearing locations.
LGJul 3, 2018
Coopetitive Soft Gating EnsembleStephan Deist, Maarten Bieshaar, Jens Schreiber et al.
In this article, we propose the Coopetititve Soft Gating Ensemble or CSGE for general machine learning tasks and interwoven systems. The goal of machine learning is to create models that generalize well for unknown datasets. Often, however, the problems are too complex to be solved with a single model, so several models are combined. Similar, Autonomic Computing requires the integration of different systems. Here, especially, the local, temporal online evaluation and the resulting (re-)weighting scheme of the CSGE makes the approach highly applicable for self-improving system integrations. To achieve the best potential performance the CSGE can be optimized according to arbitrary loss functions making it accessible for a broader range of problems. We introduce a novel training procedure including a hyper-parameter initialisation at its heart. We show that the CSGE approach reaches state-of-the-art performance for both classification and regression tasks. Further on, the CSGE provides a human-readable quantification on the influence of all base estimators employing the three weighting aspects. Moreover, we provide a scikit-learn compatible implementation.
CVMar 9, 2018
Cooperative Starting Movement Detection of Cyclists Using Convolutional Neural Networks and a Boosted Stacking EnsembleMaarten Bieshaar, Stefan Zernetsch, Andreas Hubert et al.
In future, vehicles and other traffic participants will be interconnected and equipped with various types of sensors, allowing for cooperation on different levels, such as situation prediction or intention detection. In this article we present a cooperative approach for starting movement detection of cyclists using a boosted stacking ensemble approach realizing feature- and decision level cooperation. We introduce a novel method based on a 3D Convolutional Neural Network (CNN) to detect starting motions on image sequences by learning spatio-temporal features. The CNN is complemented by a smart device based starting movement detection originating from smart devices carried by the cyclist. Both model outputs are combined in a stacking ensemble approach using an extreme gradient boosting classifier resulting in a fast and yet robust cooperative starting movement detector. We evaluate our cooperative approach on real-world data originating from experiments with 49 test subjects consisting of 84 starting motions.
AIMar 9, 2018
Highly Automated Learning for Improved Active Safety of Vulnerable Road UsersMaarten Bieshaar, Günther Reitberger, Viktor Kreß et al.
Highly automated driving requires precise models of traffic participants. Many state of the art models are currently based on machine learning techniques. Among others, the required amount of labeled data is one major challenge. An autonomous learning process addressing this problem is proposed. The initial models are iteratively refined in three steps: (1) detection and context identification, (2) novelty detection and active learning and (3) online model adaption.
HCMar 6, 2018
Where is my Device? - Detecting the Smart Device's Wearing Location in the Context of Active Safety for Vulnerable Road UsersMaarten Bieshaar
This article describes an approach to detect the wearing location of smart devices worn by pedestrians and cyclists. The detection, which is based solely on the sensors of the smart devices, is important context-information which can be used to parametrize subsequent algorithms, e.g. for dead reckoning or intention detection to improve the safety of vulnerable road users. The wearing location recognition can in terms of Organic Computing (OC) be seen as a step towards self-awareness and self-adaptation. For the wearing location detection a two-stage process is presented. It is subdivided into moving detection followed by the wearing location classification. Finally, the approach is evaluated on a real world dataset consisting of pedestrians and cyclists.
CYMar 6, 2018
Cooperative Tracking of Cyclists Based on Smart Devices and InfrastructureGünther Reitberger, Stefan Zernetsch, Maarten Bieshaar et al.
In future traffic scenarios, vehicles and other traffic participants will be interconnected and equipped with various types of sensors, allowing for cooperation based on data or information exchange. This article presents an approach to cooperative tracking of cyclists using smart devices and infrastructure-based sensors. A smart device is carried by the cyclists and an intersection is equipped with a wide angle stereo camera system. Two tracking models are presented and compared. The first model is based on the stereo camera system detections only, whereas the second model cooperatively combines the camera based detections with velocity and yaw rate data provided by the smart device. Our aim is to overcome limitations of tracking approaches based on single data sources. We show in numerical evaluations on scenes where cyclists are starting or turning right that the cooperation leads to an improvement in both the ability to keep track of a cyclist and the accuracy of the track particularly when it comes to occlusions in the visual system. We, therefore, contribute to the safety of vulnerable road users in future traffic.