CVJul 12, 2023Code
The IMPTC Dataset: An Infrastructural Multi-Person Trajectory and Context DatasetManuel Hetzel, Hannes Reichert, Günther Reitberger et al.
Inner-city intersections are among the most critical traffic areas for injury and fatal accidents. Automated vehicles struggle with the complex and hectic everyday life within those areas. Sensor-equipped smart infrastructures, which can cooperate with vehicles, can benefit automated traffic by extending the perception capabilities of drivers and vehicle perception systems. Additionally, they offer the opportunity to gather reproducible and precise data of a holistic scene understanding, including context information as a basis for training algorithms for various applications in automated traffic. Therefore, we introduce the Infrastructural Multi-Person Trajectory and Context Dataset (IMPTC). We use an intelligent public inner-city intersection in Germany with visual sensor technology. A multi-view camera and LiDAR system perceives traffic situations and road users' behavior. Additional sensors monitor contextual information like weather, lighting, and traffic light signal status. The data acquisition system focuses on Vulnerable Road Users (VRUs) and multi-agent interaction. The resulting dataset consists of eight hours of measurement data. It contains over 2,500 VRU trajectories, including pedestrians, cyclists, e-scooter riders, strollers, and wheelchair users, and over 20,000 vehicle trajectories at different day times, weather conditions, and seasons. In addition, to enable the entire stack of research capabilities, the dataset includes all data, starting from the sensor-, calibration- and detection data until trajectory and context data. The dataset is continuously expanded and is available online for non-commercial research at https://github.com/kav-institute/imptc-dataset.
52.7LGJun 3
ALINC: Active Learning for Inductive Node Classification via Graph SamplingPascal Plettenberg, Denis Huseljic, André Alcalde et al.
Active learning (AL) for node classification typically focuses on selecting the most informative nodes for annotation within one or a few large graphs (e.g., in social network analysis). However, in other domains, such as molecular chemistry or electronic design automation, datasets consist of thousands of independent graphs. In many of these inductive settings, annotating an individual node requires a full-graph analysis, which effectively yields the remaining node labels on-the-fly. Therefore, these scenarios require AL strategies that select entire graphs instead of single nodes, a problem which has not been tackled in the literature so far. Thus, we introduce ALINC, an AL framework for inductive node classification via graph sampling. It bridges the existing methodological gap by elevating node-level utility measures to graph-level selection criteria through various aggregation mechanisms. In an extensive benchmark including ten strategies, three aggregation methods, and four datasets, we identify CoreSet, TypiClust, and BADGE as the top-performing graph sampling strategies. Our detailed analysis further reveals that the choice of the aggregation method is pivotal, as it substantially affects model performance and annotation costs. Finally, we demonstrate the effectiveness of ALINC in two use case studies: site-of-metabolism prediction in molecules and design automation of printed circuit board schematics.
ROOct 17, 2022
Space, Time, and Interaction: A Taxonomy of Corner Cases in Trajectory Datasets for Automated DrivingKevin Rösch, Florian Heidecker, Julian Truetsch et al.
Trajectory data analysis is an essential component for highly automated driving. Complex models developed with these data predict other road users' movement and behavior patterns. Based on these predictions - and additional contextual information such as the course of the road, (traffic) rules, and interaction with other road users - the highly automated vehicle (HAV) must be able to reliably and safely perform the task assigned to it, e.g., moving from point A to B. Ideally, the HAV moves safely through its environment, just as we would expect a human driver to do. However, if unusual trajectories occur, so-called trajectory corner cases, a human driver can usually cope well, but an HAV can quickly get into trouble. In the definition of trajectory corner cases, which we provide in this work, we will consider the relevance of unusual trajectories with respect to the task at hand. Based on this, we will also present a taxonomy of different trajectory corner cases. The categorization of corner cases into the taxonomy will be shown with examples and is done by cause and required data sources. To illustrate the complexity between the machine learning (ML) model and the corner case cause, we present a general processing chain underlying the taxonomy.
LGApr 28, 2022
Model Selection, Adaptation, and Combination for Transfer Learning in Wind and Photovoltaic Power ForecastsJens Schreiber, Bernhard Sick
There is recent interest in using model hubs, a collection of pre-trained models, in computer vision tasks. To utilize the model hub, we first select a source model and then adapt the model for the target to compensate for differences. While there is yet limited research on model selection and adaption for computer vision tasks, this holds even more for the field of renewable power. At the same time, it is a crucial challenge to provide forecasts for the increasing demand for power forecasts based on weather features from a numerical weather prediction. We close these gaps by conducting the first thorough experiment for model selection and adaptation for transfer learning in renewable power forecast, adopting recent results from the field of computer vision on 667 wind and photovoltaic parks. To the best of our knowledge, this makes it the most extensive study for transfer learning in renewable power forecasts reducing the computational effort and improving the forecast error. Therefore, we adopt source models based on target data from different seasons and limit the amount of training data. As an extension of the current state of the art, we utilize a Bayesian linear regression for forecasting the response based on features extracted from a neural network. This approach outperforms the baseline with only seven days of training data. We further show how combining multiple models through ensembles can significantly improve the model selection and adaptation approach.
LGJun 16, 2023
ActiveGLAE: A Benchmark for Deep Active Learning with TransformersLukas Rauch, Matthias Aßenmacher, Denis Huseljic et al.
Deep active learning (DAL) seeks to reduce annotation costs by enabling the model to actively query instance annotations from which it expects to learn the most. Despite extensive research, there is currently no standardized evaluation protocol for transformer-based language models in the field of DAL. Diverse experimental settings lead to difficulties in comparing research and deriving recommendations for practitioners. To tackle this challenge, we propose the ActiveGLAE benchmark, a comprehensive collection of data sets and evaluation guidelines for assessing DAL. Our benchmark aims to facilitate and streamline the evaluation process of novel DAL strategies. Additionally, we provide an extensive overview of current practice in DAL with transformer-based language models. We identify three key challenges - data set selection, model training, and DAL settings - that pose difficulties in comparing query strategies. We establish baseline results through an extensive set of experiments as a reference point for evaluating future work. Based on our findings, we provide guidelines for researchers and practitioners.
LGApr 5, 2023
Multi-annotator Deep Learning: A Probabilistic Framework for ClassificationMarek Herde, Denis Huseljic, Bernhard Sick
Solving complex classification tasks using deep neural networks typically requires large amounts of annotated data. However, corresponding class labels are noisy when provided by error-prone annotators, e.g., crowdworkers. Training standard deep neural networks leads to subpar performances in such multi-annotator supervised learning settings. We address this issue by presenting a probabilistic training framework named multi-annotator deep learning (MaDL). A downstream ground truth and an annotator performance model are jointly trained in an end-to-end learning approach. The ground truth model learns to predict instances' true class labels, while the annotator performance model infers probabilistic estimates of annotators' performances. A modular network architecture enables us to make varying assumptions regarding annotators' performances, e.g., an optional class or instance dependency. Further, we learn annotator embeddings to estimate annotators' densities within a latent space as proxies of their potentially correlated annotations. Together with a weighted loss function, we improve the learning from correlated annotation patterns. In a comprehensive evaluation, we examine three research questions about multi-annotator supervised learning. Our findings show MaDL's state-of-the-art performance and robustness against many correlated, spamming annotators.
LGApr 29, 2022
Task Embedding Temporal Convolution Networks for Transfer Learning Problems in Renewable Power Time-Series ForecastJens Schreiber, Stephan Vogt, Bernhard Sick
Task embeddings in multi-layer perceptrons for multi-task learning and inductive transfer learning in renewable power forecasts have recently been introduced. In many cases, this approach improves the forecast error and reduces the required training data. However, it does not take the seasonal influences in power forecasts within a day into account, i.e., the diurnal cycle. Therefore, we extended this idea to temporal convolutional networks to consider those seasonalities. We propose transforming the embedding space, which contains the latent similarities between tasks, through convolution and providing these results to the network's residual block. The proposed architecture significantly improves up to 25 percent for multi-task learning for power forecasts on the EuropeWindFarm and GermanSolarFarm dataset compared to the multi-layer perceptron approach. Based on the same data, we achieve a ten percent improvement for the wind datasets and more than 20 percent in most cases for the solar dataset for inductive transfer learning without catastrophic forgetting. Finally, we are the first proposing zero-shot learning for renewable power forecasts to provide predictions even if no training data is available.
SDAug 14, 2023
Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with TransformersLukas Rauch, Raphael Schwinger, Moritz Wirth et al.
We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL). Leveraging transformer models, we aim to bypass traditional spectrogram conversions, enabling direct raw audio processing. ActiveBird2Vec is set to generate high-quality bird sound representations through SSL, potentially accelerating the assessment of environmental changes and decision-making processes for wind farms. Additionally, we seek to utilize the wide variety of bird vocalizations through DAL, reducing the reliance on extensively labeled datasets by human experts. We plan to curate a comprehensive set of tasks through Huggingface Datasets, enhancing future comparability and reproducibility of bioacoustic research. A comparative analysis between various transformer models will be conducted to evaluate their proficiency in bird sound recognition tasks. We aim to accelerate the progression of avian bioacoustic research and contribute to more effective conservation strategies.
CVApr 29, 2023
Sensor Equivariance by LiDAR Projection ImagesHannes Reichert, Manuel Hetzel, Steven Schreck et al.
In this work, we propose an extension of conventional image data by an additional channel in which the associated projection properties are encoded. This addresses the issue of sensor-dependent object representation in projection-based sensors, such as LiDAR, which can lead to distorted physical and geometric properties due to variations in sensor resolution and field of view. To that end, we propose an architecture for processing this data in an instance segmentation framework. We focus specifically on LiDAR as a key sensor modality for machine vision tasks and highly automated driving (HAD). Through an experimental setup in a controlled synthetic environment, we identify a bias on sensor resolution and field of view and demonstrate that our proposed method can reduce said bias for the task of LiDAR instance segmentation. Furthermore, we define our method such that it can be applied to other projection-based sensors, such as cameras. To promote transparency, we make our code and dataset publicly available. This method shows the potential to improve performance and robustness in various machine vision tasks that utilize projection-based sensors.
CVJul 12, 2023
Smart Infrastructure: A Research JunctionManuel Hetzel, Hannes Reichert, Konrad Doll et al.
Complex inner-city junctions are among the most critical traffic areas for injury and fatal accidents. The development of highly automated driving (HAD) systems struggles with the complex and hectic everyday life within those areas. Sensor-equipped smart infrastructures, which can communicate and cooperate with vehicles, are essential to enable a holistic scene understanding to resolve occlusions drivers and vehicle perception systems for themselves can not cover. We introduce an intelligent research infrastructure equipped with visual sensor technology, located at a public inner-city junction in Aschaffenburg, Germany. A multiple-view camera system monitors the traffic situation to perceive road users' behavior. Both motorized and non-motorized traffic is considered. The system is used for research in data generation, evaluating new HAD sensors systems, algorithms, and Artificial Intelligence (AI) training strategies using real-, synthetic- and augmented data. In addition, the junction features a highly accurate digital twin. Real-world data can be taken into the digital twin for simulation purposes and synthetic data generation.
LGJul 26, 2023
Unraveling the Complexity of Splitting Sequential Data: Tackling Challenges in Video and Time Series AnalysisDiego Botache, Kristina Dingel, Rico Huhnstock et al.
Splitting of sequential data, such as videos and time series, is an essential step in various data analysis tasks, including object tracking and anomaly detection. However, splitting sequential data presents a variety of challenges that can impact the accuracy and reliability of subsequent analyses. This concept article examines the challenges associated with splitting sequential data, including data acquisition, data representation, split ratio selection, setting up quality criteria, and choosing suitable selection strategies. We explore these challenges through two real-world examples: motor test benches and particle tracking in liquids.
LGSep 22, 2023
Enhancing Multi-Objective Optimization through Machine Learning-Supported Multiphysics SimulationDiego Botache, Jens Decke, Winfried Ripken et al.
This paper presents a methodological framework for training, self-optimising, and self-organising surrogate models to approximate and speed up multiobjective optimisation of technical systems based on multiphysics simulations. At the hand of two real-world datasets, we illustrate that surrogate models can be trained on relatively small amounts of data to approximate the underlying simulations accurately. Including explainable AI techniques allow for highlighting feature relevancy or dependencies and supporting the possible extension of the used datasets. One of the datasets was created for this paper and is made publicly available for the broader scientific community. Extensive experiments combine four machine learning and deep learning algorithms with an evolutionary optimisation algorithm. The performance of the combined training and optimisation pipeline is evaluated by verifying the generated Pareto-optimal results using the ground truth simulations. The results from our pipeline and a comprehensive evaluation strategy show the potential for efficiently acquiring solution candidates in multiobjective optimisation tasks by reducing the number of simulations and conserving a higher prediction accuracy, i.e., with a MAPE score under 5% for one of the presented use cases.
LGOct 12, 2022
Efficient Bayesian Updates for Deep Learning via Laplace ApproximationsDenis Huseljic, Marek Herde, Lukas Rauch et al.
Since training deep neural networks takes significant computational resources, extending the training dataset with new data is difficult, as it typically requires complete retraining. Moreover, specific applications do not allow costly retraining due to time or computational constraints. We address this issue by proposing a novel Bayesian update method for deep neural networks by using a last-layer Laplace approximation. Concretely, we leverage second-order optimization techniques on the Gaussian posterior distribution of a Laplace approximation, computing the inverse Hessian matrix in closed form. This way, our method allows for fast and effective updates upon the arrival of new data in a stationary setting. A large-scale evaluation study across different data modalities confirms that our updates are a fast and competitive alternative to costly retraining. Furthermore, we demonstrate its applicability in a deep active learning scenario by using our update to improve existing selection strategies.
CVOct 6, 2022
A Review of Uncertainty Calibration in Pretrained Object DetectorsDenis Huseljic, Marek Herde, Mehmet Muejde et al.
In the field of deep learning based computer vision, the development of deep object detection has led to unique paradigms (e.g., two-stage or set-based) and architectures (e.g., Faster-RCNN or DETR) which enable outstanding performance on challenging benchmark datasets. Despite this, the trained object detectors typically do not reliably assess uncertainty regarding their own knowledge, and the quality of their probabilistic predictions is usually poor. As these are often used to make subsequent decisions, such inaccurate probabilistic predictions must be avoided. In this work, we investigate the uncertainty calibration properties of different pretrained object detection architectures in a multi-class setting. We propose a framework to ensure a fair, unbiased, and repeatable evaluation and conduct detailed analyses assessing the calibration under distributional changes (e.g., distributional shift and application to out-of-distribution data). Furthermore, by investigating the influence of different detector paradigms, post-processing steps, and suitable choices of metrics, we deliver novel insights into why poor detector calibration emerges. Based on these insights, we are able to improve the calibration of a detector by simply finetuning its last layer.
LGApr 1, 2022
Synthetic Photovoltaic and Wind Power Forecasting DataStephan Vogt, Jens Schreiber, Bernhard Sick
Photovoltaic and wind power forecasts in power systems with a high share of renewable energy are essential in several applications. These include stable grid operation, profitable power trading, and forward-looking system planning. However, there is a lack of publicly available datasets for research on machine learning based prediction methods. This paper provides an openly accessible time series dataset with realistic synthetic power data. Other publicly and non-publicly available datasets often lack precise geographic coordinates, timestamps, or static power plant information, e.g., to protect business secrets. On the opposite, this dataset provides these. The dataset comprises 120 photovoltaic and 273 wind power plants with distinct sides all over Germany from 500 days in hourly resolution. This large number of available sides allows forecasting experiments to include spatial correlations and run experiments in transfer and multi-task learning. It includes side-specific, power source-dependent, non-synthetic input features from the ICON-EU weather model. A simulation of virtual power plants with physical models and actual meteorological measurements provides realistic synthetic power measurement time series. These time series correspond to the power output of virtual power plants at the location of the respective weather measurements. Since the synthetic time series are based exclusively on weather measurements, possible errors in the weather forecast are comparable to those in actual power data. In addition to the data description, we evaluate the quality of weather-prediction-based power forecasts by comparing simplified physical models and a machine learning model. This experiment shows that forecasts errors on the synthetic power data are comparable to real-world historical power measurements.
LGJul 10, 2023
DADO -- Low-Cost Query Strategies for Deep Active Design OptimizationJens Decke, Christian Gruhl, Lukas Rauch et al.
In this experience report, we apply deep active learning to the field of design optimization to reduce the number of computationally expensive numerical simulations. We are interested in optimizing the design of structural components, where the shape is described by a set of parameters. If we can predict the performance based on these parameters and consider only the promising candidates for simulation, there is an enormous potential for saving computing power. We present two selection strategies for self-optimization to reduce the computational cost in multi-objective design optimization problems. Our proposed methodology provides an intuitive approach that is easy to apply, offers significant improvements over random sampling, and circumvents the need for uncertainty estimation. We evaluate our strategies on a large dataset from the domain of fluid dynamics and introduce two new evaluation metrics to determine the model's performance. Findings from our evaluation highlights the effectiveness of our selection strategies in accelerating design optimization. We believe that the introduced method is easily transferable to other self-optimization problems.
CVJul 30, 2024
dopanim: A Dataset of Doppelganger Animals with Noisy Annotations from Multiple HumansMarek Herde, Denis Huseljic, Lukas Rauch et al.
Human annotators typically provide annotated data for training machine learning models, such as neural networks. Yet, human annotations are subject to noise, impairing generalization performances. Methodological research on approaches counteracting noisy annotations requires corresponding datasets for a meaningful empirical evaluation. Consequently, we introduce a novel benchmark dataset, dopanim, consisting of about 15,750 animal images of 15 classes with ground truth labels. For approximately 10,500 of these images, 20 humans provided over 52,000 annotations with an accuracy of circa 67%. Its key attributes include (1) the challenging task of classifying doppelganger animals, (2) human-estimated likelihoods as annotations, and (3) annotator metadata. We benchmark well-known multi-annotator learning approaches using seven variants of this dataset and outline further evaluation use cases such as learning beyond hard class labels and active learning. Our dataset and a comprehensive codebase are publicly available to emulate the data collection process and to reproduce all empirical results.
CVSep 12, 2023
Active Label Refinement for Semantic Segmentation of Satellite ImagesTuan Pham Minh, Jayan Wijesingha, Daniel Kottke et al.
Remote sensing through semantic segmentation of satellite images contributes to the understanding and utilisation of the earth's surface. For this purpose, semantic segmentation networks are typically trained on large sets of labelled satellite images. However, obtaining expert labels for these images is costly. Therefore, we propose to rely on a low-cost approach, e.g. crowdsourcing or pretrained networks, to label the images in the first step. Since these initial labels are partially erroneous, we use active learning strategies to cost-efficiently refine the labels in the second step. We evaluate the active learning strategies using satellite images of Bengaluru in India, labelled with land cover and land use labels. Our experimental results suggest that an active label refinement to improve the semantic segmentation network's performance is beneficial.
LGJul 5, 2024
Graph Reinforcement Learning for Power Grids: A Comprehensive SurveyMohamed Hassouna, Clara Holzhüter, Pawel Lytaev et al.
The increasing share of renewable energy and distributed electricity generation requires the development of deep learning approaches to address the lack of flexibility inherent in traditional power grid methods. In this context, Graph Neural Networks are a promising solution due to their ability to learn from graph-structured data. Combined with Reinforcement Learning, they can be used as control approaches to determine remedial actions. This review analyses how Graph Reinforcement Learning can improve representation learning and decision-making in power grid applications, particularly transmission and distribution grids. We analyze the reviewed approaches in terms of the graph structure, the Graph Neural Network architecture, and the Reinforcement Learning approach. Although Graph Reinforcement Learning has demonstrated adaptability to unpredictable events and noisy data, its current stage is primarily proof-of-concept, and it is not yet deployable to real-world applications. We highlight the open challenges and limitations for real-world applications.
47.1LGMar 13
BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active LearningDenis Huseljic, Paul Hahn, Marek Herde et al.
Active learning (AL) aims to reduce annotation costs while maximizing model performance by iteratively selecting valuable instances. While foundation models have made it easier to identify these instances, existing selection strategies still lack robustness across different models, annotation budgets, and datasets. To highlight the potential weaknesses of existing AL strategies and provide a reference point for research, we explore oracle strategies, i.e., strategies that approximate the optimal selection by accessing ground-truth information unavailable in practical AL scenarios. Current oracle strategies, however, fail to scale effectively to large datasets and complex deep neural networks. To tackle these limitations, we introduce the Best-of-Strategy Selector (BoSS), a scalable oracle strategy designed for large-scale AL scenarios. BoSS constructs a set of candidate batches through an ensemble of selection strategies and then selects the batch yielding the highest performance gain. As an ensemble of selection strategies, BoSS can be easily extended with new state-of-the-art strategies as they emerge, ensuring it remains a reliable oracle strategy in the future. Our evaluation demonstrates that i) BoSS outperforms existing oracle strategies, ii) state-of-the-art AL strategies still fall noticeably short of oracle performance, especially in large-scale datasets with many classes, and iii) one possible solution to counteract the inconsistent performance of AL strategies might be to employ an ensemble-based approach for the selection.
CVApr 13, 2024Code
Fast Fishing: Approximating BAIT for Efficient and Scalable Deep Active Image ClassificationDenis Huseljic, Paul Hahn, Marek Herde et al.
Deep active learning (AL) seeks to minimize the annotation costs for training deep neural networks. BAIT, a recently proposed AL strategy based on the Fisher Information, has demonstrated impressive performance across various datasets. However, BAIT's high computational and memory requirements hinder its applicability on large-scale classification tasks, resulting in current research neglecting BAIT in their evaluation. This paper introduces two methods to enhance BAIT's computational efficiency and scalability. Notably, we significantly reduce its time complexity by approximating the Fisher Information. In particular, we adapt the original formulation by i) taking the expectation over the most probable classes, and ii) constructing a binary classification task, leading to an alternative likelihood for gradient computations. Consequently, this allows the efficient use of BAIT on large-scale datasets, including ImageNet. Our unified and comprehensive evaluation across a variety of datasets demonstrates that our approximations achieve strong performance with considerably reduced time complexity. Furthermore, we provide an extensive open-source toolbox that implements recent state-of-the-art AL strategies, available at https://github.com/dhuseljic/dal-toolbox.
CVFeb 11Code
DD-MDN: Human Trajectory Forecasting with Diffusion-Based Dual Mixture Density Networks and Uncertainty Self-CalibrationManuel Hetzel, Kerim Turacan, Hannes Reichert et al.
Human Trajectory Forecasting (HTF) predicts future human movements from past trajectories and environmental context, with applications in Autonomous Driving, Smart Surveillance, and Human-Robot Interaction. While prior work has focused on accuracy, social interaction modeling, and diversity, little attention has been paid to uncertainty modeling, calibration, and forecasts from short observation periods, which are crucial for downstream tasks such as path planning and collision avoidance. We propose DD-MDN, an end-to-end probabilistic HTF model that combines high positional accuracy, calibrated uncertainty, and robustness to short observations. Using a few-shot denoising diffusion backbone and a dual mixture density network, our method learns self-calibrated residence areas and probability-ranked anchor paths, from which diverse trajectory hypotheses are derived, without predefined anchors or endpoints. Experiments on the ETH/UCY, SDD, inD, and IMPTC datasets demonstrate state-of-the-art accuracy, robustness at short observation intervals, and reliable uncertainty modeling. The code is available at: https://github.com/kav-institute/ddmdn.
ROApr 30, 2025Code
Real Time Semantic Segmentation of High Resolution Automotive LiDAR ScansHannes Reichert, Benjamin Serfling, Elijah Schüssler et al.
In recent studies, numerous previous works emphasize the importance of semantic segmentation of LiDAR data as a critical component to the development of driver-assistance systems and autonomous vehicles. However, many state-of-the-art methods are tested on outdated, lower-resolution LiDAR sensors and struggle with real-time constraints. This study introduces a novel semantic segmentation framework tailored for modern high-resolution LiDAR sensors that addresses both accuracy and real-time processing demands. We propose a novel LiDAR dataset collected by a cutting-edge automotive 128 layer LiDAR in urban traffic scenes. Furthermore, we propose a semantic segmentation method utilizing surface normals as strong input features. Our approach is bridging the gap between cutting-edge research and practical automotive applications. Additionaly, we provide a Robot Operating System (ROS2) implementation that we operate on our research vehicle. Our dataset and code are publicly available: https://github.com/kav-institute/SemanticLiDAR.
LGMay 6, 2024Code
Annot-Mix: Learning with Noisy Class Labels from Multiple Annotators via a Mixup ExtensionMarek Herde, Lukas Lührs, Denis Huseljic et al.
Training with noisy class labels impairs neural networks' generalization performance. In this context, mixup is a popular regularization technique to improve training robustness by making memorizing false class labels more difficult. However, mixup neglects that, typically, multiple annotators, e.g., crowdworkers, provide class labels. Therefore, we propose an extension of mixup, which handles multiple class labels per instance while considering which class label originates from which annotator. Integrated into our multi-annotator classification framework annot-mix, it performs superiorly to eight state-of-the-art approaches on eleven datasets with noisy class labels provided either by human or simulated annotators. Our code is publicly available through our repository at https://github.com/ies-research/annot-mix.
LGSep 18, 2024
Location based Probabilistic Load Forecasting of EV Charging Sites: Deep Transfer Learning with Multi-Quantile Temporal Convolutional NetworkMohammad Wazed Ali, Asif bin Mustafa, Md. Aukerul Moin Shuvo et al.
Electrification of vehicles is a potential way of reducing fossil fuel usage and thus lessening environmental pollution. Electric Vehicles (EVs) of various types for different transport modes (including air, water, and land) are evolving. Moreover, different EV user groups (commuters, commercial or domestic users, drivers) may use different charging infrastructures (public, private, home, and workplace) at various times. Therefore, usage patterns and energy demand are very stochastic. Characterizing and forecasting the charging demand of these diverse EV usage profiles is essential in preventing power outages. Previously developed data-driven load models are limited to specific use cases and locations. None of these models are simultaneously adaptive enough to transfer knowledge of day-ahead forecasting among EV charging sites of diverse locations, trained with limited data, and cost-effective. This article presents a location-based load forecasting of EV charging sites using a deep Multi-Quantile Temporal Convolutional Network (MQ-TCN) to overcome the limitations of earlier models. We conducted our experiments on data from four charging sites, namely Caltech, JPL, Office-1, and NREL, which have diverse EV user types like students, full-time and part-time employees, random visitors, etc. With a Prediction Interval Coverage Probability (PICP) score of 93.62\%, our proposed deep MQ-TCN model exhibited a remarkable 28.93\% improvement over the XGBoost model for a day-ahead load forecasting at the JPL charging site. By transferring knowledge with the inductive Transfer Learning (TL) approach, the MQ-TCN model achieved a 96.88\% PICP score for the load forecasting task at the NREL site using only two weeks of data.
LGApr 16, 2024
AudioProtoPNet: An interpretable deep learning model for bird sound classificationRené Heinrich, Lukas Rauch, Bernhard Sick et al.
Deep learning models have significantly advanced acoustic bird monitoring by being able to recognize numerous bird species based on their vocalizations. However, traditional deep learning models are black boxes that provide no insight into their underlying computations, limiting their usefulness to ornithologists and machine learning engineers. Explainable models could facilitate debugging, knowledge discovery, trust, and interdisciplinary collaboration. This study introduces AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification. It is an inherently interpretable model that uses a ConvNeXt backbone to extract embeddings, with the classification layer replaced by a prototype learning classifier trained on these embeddings. The classifier learns prototypical patterns of each bird species' vocalizations from spectrograms of training instances. During inference, audio recordings are classified by comparing them to the learned prototypes in the embedding space, providing explanations for the model's decisions and insights into the most informative embeddings of each bird species. The model was trained on the BirdSet training dataset, which consists of 9,734 bird species and over 6,800 hours of recordings. Its performance was evaluated on the seven test datasets of BirdSet, covering different geographical regions. AudioProtoPNet outperformed the state-of-the-art model Perch, achieving an average AUROC of 0.90 and a cmAP of 0.42, with relative improvements of 7.1% and 16.7% over Perch, respectively. These results demonstrate that even for the challenging task of multi-label bird sound classification, it is possible to develop powerful yet inherently interpretable deep learning models that provide valuable insights for ornithologists and machine learning engineers.
SDMar 15, 2024
BirdSet: A Large-Scale Dataset for Audio Classification in Avian BioacousticsLukas Rauch, Raphael Schwinger, Moritz Wirth et al.
Deep learning (DL) has greatly advanced audio classification, yet the field is limited by the scarcity of large-scale benchmark datasets that have propelled progress in other domains. While AudioSet is a pivotal step to bridge this gap as a universal-domain dataset, its restricted accessibility and limited range of evaluation use cases challenge its role as the sole resource. Therefore, we introduce BirdSet, a large-scale benchmark dataset for audio classification focusing on avian bioacoustics. BirdSet surpasses AudioSet with over 6,800 recording hours ($\uparrow\!17\%$) from nearly 10,000 classes ($\uparrow\!18\times$) for training and more than 400 hours ($\uparrow\!7\times$) across eight strongly labeled evaluation datasets. It serves as a versatile resource for use cases such as multi-label classification, covariate shift or self-supervised learning. We benchmark six well-known DL models in multi-label classification across three distinct training scenarios and outline further evaluation use cases in audio classification. We host our dataset on Hugging Face for easy accessibility and offer an extensive codebase to reproduce our results.
LGApr 17, 2025
Can Masked Autoencoders Also Listen to Birds?Lukas Rauch, René Heinrich, Ilyass Moummad et al.
Masked Autoencoders (MAEs) learn rich semantic representations in audio classification through an efficient self-supervised reconstruction task. However, general-purpose models fail to generalize well when applied directly to fine-grained audio domains. Specifically, bird-sound classification requires distinguishing subtle inter-species differences and managing high intra-species acoustic variability, revealing the performance limitations of general-domain Audio-MAEs. This work demonstrates that bridging this domain gap domain gap requires full-pipeline adaptation, not just domain-specific pretraining data. We systematically revisit and adapt the pretraining recipe, fine-tuning methods, and frozen feature utilization to bird sounds using BirdSet, a large-scale bioacoustic dataset comparable to AudioSet. Our resulting Bird-MAE achieves new state-of-the-art results in BirdSet's multi-label classification benchmark. Additionally, we introduce the parameter-efficient prototypical probing, enhancing the utility of frozen MAE representations and closely approaching fine-tuning performance in low-resource settings. Bird-MAE's prototypical probes outperform linear probing by up to 37 percentage points in mean average precision and narrow the gap to fine-tuning across BirdSet downstream tasks. Bird-MAE also demonstrates robust few-shot capabilities with prototypical probing in our newly established few-shot benchmark on BirdSet, highlighting the potential of tailored self-supervised learning pipelines for fine-grained audio domains.
LGMar 19, 2025
Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning ApproachMohamed Hassouna, Clara Holzhüter, Malte Lehna et al.
The rising proportion of renewable energy in the electricity mix introduces significant operational challenges for power grid operators. Effective power grid management demands adaptive decision-making strategies capable of handling dynamic conditions. With the increase in complexity, more and more Deep Learning (DL) approaches have been proposed to find suitable grid topologies for congestion management. In this work, we contribute to this research by introducing a novel Imitation Learning (IL) approach that leverages soft labels derived from simulated topological action outcomes, thereby capturing multiple viable actions per state. Unlike traditional IL methods that rely on hard labels to enforce a single optimal action, our method constructs soft labels that capture the effectiveness of actions that prove suitable in resolving grid congestion. To further enhance decision-making, we integrate Graph Neural Networks (GNNs) to encode the structural properties of power grids, ensuring that the topology-aware representations contribute to better agent performance. Our approach significantly outperforms its hard-label counterparts as well as state-of-the-art Deep Reinforcement Learning (DRL) baseline agents. Most notably, it achieves a 17% better performance compared to the greedy expert agent from which the imitation targets were derived.
LGJan 29, 2024
Spatio-Temporal Attention Graph Neural Network for Remaining Useful Life PredictionZhixin Huang, Yujiang He, Bernhard Sick
Remaining useful life prediction plays a crucial role in the health management of industrial systems. Given the increasing complexity of systems, data-driven predictive models have attracted significant research interest. Upon reviewing the existing literature, it appears that many studies either do not fully integrate both spatial and temporal features or employ only a single attention mechanism. Furthermore, there seems to be inconsistency in the choice of data normalization methods, particularly concerning operating conditions, which might influence predictive performance. To bridge these observations, this study presents the Spatio-Temporal Attention Graph Neural Network. Our model combines graph neural networks and temporal convolutional neural networks for spatial and temporal feature extraction, respectively. The cascade of these extractors, combined with multi-head attention mechanisms for both spatio-temporal dimensions, aims to improve predictive precision and refine model explainability. Comprehensive experiments were conducted on the C-MAPSS dataset to evaluate the impact of unified versus clustering normalization. The findings suggest that our model performs state-of-the-art results using only the unified normalization. Additionally, when dealing with datasets with multiple operating conditions, cluster normalization enhances the performance of our proposed model by up to 27%.
LGNov 27, 2025
Cleaning the Pool: Progressive Filtering of Unlabeled Pools in Deep Active LearningDenis Huseljic, Marek Herde, Lukas Rauch et al.
Existing active learning (AL) strategies capture fundamentally different notions of data value, e.g., uncertainty or representativeness. Consequently, the effectiveness of strategies can vary substantially across datasets, models, and even AL cycles. Committing to a single strategy risks suboptimal performance, as no single strategy dominates throughout the entire AL process. We introduce REFINE, an ensemble AL method that combines multiple strategies without knowing in advance which will perform best. In each AL cycle, REFINE operates in two stages: (1) Progressive filtering iteratively refines the unlabeled pool by considering an ensemble of AL strategies, retaining promising candidates capturing different notions of value. (2) Coverage-based selection then chooses a final batch from this refined pool, ensuring all previously identified notions of value are accounted for. Extensive experiments across 6 classification datasets and 3 foundation models show that REFINE consistently outperforms individual strategies and existing ensemble methods. Notably, progressive filtering serves as a powerful preprocessing step that improves the performance of any individual AL strategy applied to the refined pool, which we demonstrate on an audio spectrogram classification use case. Finally, the ensemble of REFINE can be easily extended with upcoming state-of-the-art AL strategies.
SDSep 29, 2025
Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio ClassificationLukas Rauch, René Heinrich, Houtan Ghaffari et al.
Although probing frozen models has become a standard evaluation paradigm, self-supervised learning in audio defaults to fine-tuning. A key reason is that global pooling creates an information bottleneck causing linear probes to misrepresent the embedding quality: The $\texttt{cls}$-token discards crucial token information about dispersed, localized events in multi-label audio. This weakness is rooted in the mismatch between the pretraining objective (operating globally) and the downstream task (localized events). Across a comprehensive benchmark of 13 datasets and 6 spectrogram-based encoders, we first investigate the global pooling bottleneck. We then introduce binarized prototypical probes: a lightweight and simple pooling method that learns prototypes to perform class-wise information aggregation. Despite its simplicity, our method notably outperforms linear and attentive probing. Our work establishes probing as a competitive and efficient paradigm for evaluating audio SSL models, challenging the reliance on costly fine-tuning.
LGJul 18, 2025
Adversarial Training Improves Generalization Under Distribution Shifts in BioacousticsRené Heinrich, Lukas Rauch, Bernhard Sick et al.
Adversarial training is a promising strategy for enhancing model robustness against adversarial attacks. However, its impact on generalization under substantial data distribution shifts in audio classification remains largely unexplored. To address this gap, this work investigates how different adversarial training strategies improve generalization performance and adversarial robustness in audio classification. The study focuses on two model architectures: a conventional convolutional neural network (ConvNeXt) and an inherently interpretable prototype-based model (AudioProtoPNet). The approach is evaluated using a challenging bird sound classification benchmark. This benchmark is characterized by pronounced distribution shifts between training and test data due to varying environmental conditions and recording methods, a common real-world challenge. The investigation explores two adversarial training strategies: one based on output-space attacks that maximize the classification loss function, and another based on embedding-space attacks designed to maximize embedding dissimilarity. These attack types are also used for robustness evaluation. Additionally, for AudioProtoPNet, the study assesses the stability of its learned prototypes under targeted embedding-space attacks. Results show that adversarial training, particularly using output-space attacks, improves clean test data performance by an average of 10.5% relative and simultaneously strengthens the adversarial robustness of the models. These findings, although derived from the bird sound domain, suggest that adversarial training holds potential to enhance robustness against both strong distribution shifts and adversarial attacks in challenging audio classification settings.
LGJun 12, 2025
Graph Neural Networks for Automatic Addition of Optimizing Components in Printed Circuit Board SchematicsPascal Plettenberg, André Alcalde, Bernhard Sick et al.
The design and optimization of Printed Circuit Board (PCB) schematics is crucial for the development of high-quality electronic devices. Thereby, an important task is to optimize drafts by adding components that improve the robustness and reliability of the circuit, e.g., pull-up resistors or decoupling capacitors. Since there is a shortage of skilled engineers and manual optimizations are very time-consuming, these best practices are often neglected. However, this typically leads to higher costs for troubleshooting in later development stages as well as shortened product life cycles, resulting in an increased amount of electronic waste that is difficult to recycle. Here, we present an approach for automating the addition of new components into PCB schematics by representing them as bipartite graphs and utilizing a node pair prediction model based on Graph Neural Networks (GNNs). We apply our approach to three highly relevant PCB design optimization tasks and compare the performance of several popular GNN architectures on real-world datasets labeled by human experts. We show that GNNs can solve these problems with high accuracy and demonstrate that our approach offers the potential to automate PCB design optimizations in a time- and cost-efficient manner.
CLMay 18, 2025
No Free Lunch in Active Learning: LLM Embedding Quality Dictates Query Strategy SuccessLukas Rauch, Moritz Wirth, Denis Huseljic et al.
The advent of large language models (LLMs) capable of producing general-purpose representations lets us revisit the practicality of deep active learning (AL): By leveraging frozen LLM embeddings, we can mitigate the computational costs of iteratively fine-tuning large backbones. This study establishes a benchmark and systematically investigates the influence of LLM embedding quality on query strategies in deep AL. We employ five top-performing models from the massive text embedding benchmark (MTEB) leaderboard and two baselines for ten diverse text classification tasks. Our findings reveal key insights: First, initializing the labeled pool using diversity-based sampling synergizes with high-quality embeddings, boosting performance in early AL iterations. Second, the choice of the optimal query strategy is sensitive to embedding quality. While the computationally inexpensive Margin sampling can achieve performance spikes on specific datasets, we find that strategies like Badge exhibit greater robustness across tasks. Importantly, their effectiveness is often enhanced when paired with higher-quality embeddings. Our results emphasize the need for context-specific evaluation of AL strategies, as performance heavily depends on embedding quality and the target task.
LGApr 12, 2025
crowd-hpo: Realistic Hyperparameter Optimization and Benchmarking for Learning from Crowds with Noisy LabelsMarek Herde, Lukas Lührs, Denis Huseljic et al.
Crowdworking is a cost-efficient solution for acquiring class labels. Since these labels are subject to noise, various approaches to learning from crowds have been proposed. Typically, these approaches are evaluated with default hyperparameter configurations, resulting in unfair and suboptimal performance, or with hyperparameter configurations tuned via a validation set with ground truth class labels, representing an often unrealistic scenario. Moreover, both setups can produce different approach rankings, complicating study comparisons. Therefore, we introduce crowd-hpo as a framework for evaluating approaches to learning from crowds in combination with criteria to select well-performing hyperparameter configurations with access only to noisy crowd-labeled validation data. Extensive experiments with neural networks demonstrate that these criteria select hyperparameter configurations, which improve the learning from crowd approaches' generalization performances, measured on separate test sets with ground truth labels. Hence, incorporating such criteria into experimental studies is essential for enabling fairer and more realistic benchmarking.
SDJun 26, 2024
Towards Deep Active Learning in Avian BioacousticsLukas Rauch, Denis Huseljic, Moritz Wirth et al.
Passive acoustic monitoring (PAM) in avian bioacoustics enables cost-effective and extensive data collection with minimal disruption to natural habitats. Despite advancements in computational avian bioacoustics, deep learning models continue to encounter challenges in adapting to diverse environments in practical PAM scenarios. This is primarily due to the scarcity of annotations, which requires labor-intensive efforts from human experts. Active learning (AL) reduces annotation cost and speed ups adaption to diverse scenarios by querying the most informative instances for labeling. This paper outlines a deep AL approach, introduces key challenges, and conducts a small-scale pilot study.
CVApr 17, 2024
Criteria for Uncertainty-based Corner Cases Detection in Instance SegmentationFlorian Heidecker, Ahmad El-Khateeb, Maarten Bieshaar et al.
The operating environment of a highly automated vehicle is subject to change, e.g., weather, illumination, or the scenario containing different objects and other participants in which the highly automated vehicle has to navigate its passengers safely. These situations must be considered when developing and validating highly automated driving functions. This already poses a problem for training and evaluating deep learning models because without the costly labeling of thousands of recordings, not knowing whether the data contains relevant, interesting data for further model training, it is a guess under which conditions and situations the model performs poorly. For this purpose, we present corner case criteria based on the predictive uncertainty. With our corner case criteria, we are able to detect uncertainty-based corner cases of an object instance segmentation model without relying on ground truth (GT) data. We evaluated each corner case criterion using the COCO and the NuImages dataset to analyze the potential of our approach. We also provide a corner case decision function that allows us to distinguish each object into True Positive (TP), localization and/or classification corner case, or False Positive (FP). We also present our first results of an iterative training cycle that outperforms the baseline and where the data added to the training dataset is selected based on the corner case decision function.
CVMay 24, 2023
Sampling-based Uncertainty Estimation for an Instance Segmentation NetworkFlorian Heidecker, Ahmad El-Khateeb, Bernhard Sick
The examination of uncertainty in the predictions of machine learning (ML) models is receiving increasing attention. One uncertainty modeling technique used for this purpose is Monte-Carlo (MC)-Dropout, where repeated predictions are generated for a single input. Therefore, clustering is required to describe the resulting uncertainty, but only through efficient clustering is it possible to describe the uncertainty from the model attached to each object. This article uses Bayesian Gaussian Mixture (BGM) to solve this problem. In addition, we investigate different values for the dropout rate and other techniques, such as focal loss and calibration, which we integrate into the Mask-RCNN model to obtain the most accurate uncertainty approximation of each instance and showcase it graphically.
FLU-DYNMay 9, 2023
Dataset of a parameterized U-bend flow for Deep Learning ApplicationsJens Decke, Olaf Wünsch, Bernhard Sick
This dataset contains 10,000 fluid flow and heat transfer simulations in U-bend shapes. Each of them is described by 28 design parameters, which are processed with the help of Computational Fluid Dynamics methods. The dataset provides a comprehensive benchmark for investigating various problems and methods from the field of design optimization. For these investigations supervised, semi-supervised and unsupervised deep learning approaches can be employed. One unique feature of this dataset is that each shape can be represented by three distinct data types including design parameter and objective combinations, five different resolutions of 2D images from the geometry and the solution variables of the numerical simulation, as well as a representation using the cell values of the numerical mesh. This third representation enables considering the specific data structure of numerical simulations for deep learning approaches. The source code and the container used to generate the data are published as part of this work.
LGFeb 14, 2022
Design of Explainability Module with Experts in the Loop for Visualization and Dynamic Adjustment of Continual LearningYujiang He, Zhixin Huang, Bernhard Sick
Continual learning can enable neural networks to evolve by learning new tasks sequentially in task-changing scenarios. However, two general and related challenges should be overcome in further research before we apply this technique to real-world applications. Firstly, newly collected novelties from the data stream in applications could contain anomalies that are meaningless for continual learning. Instead of viewing them as a new task for updating, we have to filter out such anomalies to reduce the disturbance of extremely high-entropy data for the progression of convergence. Secondly, fewer efforts have been put into research regarding the explainability of continual learning, which leads to a lack of transparency and credibility of the updated neural networks. Elaborated explanations about the process and result of continual learning can help experts in judgment and making decisions. Therefore, we propose the conceptual design of an explainability module with experts in the loop based on techniques, such as dimension reduction, visualization, and evaluation strategies. This work aims to overcome the mentioned challenges by sufficiently explaining and visualizing the identified anomalies and the updated neural network. With the help of this module, experts can be more confident in decision-making regarding anomaly filtering, dynamic adjustment of hyperparameters, data backup, etc.
LGSep 23, 2021
A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in ClassificationMarek Herde, Denis Huseljic, Bernhard Sick et al.
Pool-based active learning (AL) aims to optimize the annotation process (i.e., labeling) as the acquisition of annotations is often time-consuming and therefore expensive. For this purpose, an AL strategy queries annotations intelligently from annotators to train a high-performance classification model at a low annotation cost. Traditional AL strategies operate in an idealized framework. They assume a single, omniscient annotator who never gets tired and charges uniformly regardless of query difficulty. However, in real-world applications, we often face human annotators, e.g., crowd or in-house workers, who make annotation mistakes and can be reluctant to respond if tired or faced with complex queries. Recently, a wide range of novel AL strategies has been proposed to address these issues. They differ in at least one of the following three central aspects from traditional AL: (1) They explicitly consider (multiple) human annotators whose performances can be affected by various factors, such as missing expertise. (2) They generalize the interaction with human annotators by considering different query and annotation types, such as asking an annotator for feedback on an inferred classification rule. (3) They take more complex cost schemes regarding annotations and misclassifications into account. This survey provides an overview of these AL strategies and refers to them as real-world AL. Therefore, we introduce a general real-world AL strategy as part of a learning cycle and use its elements, e.g., the query and annotator selection algorithm, to categorize about 60 real-world AL strategies. Finally, we outline possible directions for future research in the field of AL.
LGSep 20, 2021
Description of Corner Cases in Automated Driving: Goals and ChallengesDaniel Bogdoll, Jasmin Breitenstein, Florian Heidecker et al.
Scaling the distribution of automated vehicles requires handling various unexpected and possibly dangerous situations, termed corner cases (CC). Since many modules of automated driving systems are based on machine learning (ML), CC are an essential part of the data for their development. However, there is only a limited amount of CC data in large-scale data collections, which makes them challenging in the context of ML. With a better understanding of CC, offline applications, e.g., dataset analysis, and online methods, e.g., improved performance of automated driving systems, can be improved. While there are knowledge-based descriptions and taxonomies for CC, there is little research on machine-interpretable descriptions. In this extended abstract, we will give a brief overview of the challenges and goals of such a description.
DATA-ANAug 31, 2021
Artificial intelligence for online characterization of ultrashort X-ray free-electron laser pulsesKristina Dingel, Thorsten Otto, Lutz Marder et al.
X-ray free-electron lasers (XFELs) as the world's brightest light sources provide ultrashort X-ray pulses with a duration typically in the order of femtoseconds. Recently, they have approached and entered the attosecond regime, which holds new promises for single-molecule imaging and studying nonlinear and ultrafast phenomena such as localized electron dynamics. The technological evolution of XFELs toward well-controllable light sources for precise metrology of ultrafast processes has been, however, hampered by the diagnostic capabilities for characterizing X-ray pulses at the attosecond frontier. In this regard, the spectroscopic technique of photoelectron angular streaking has successfully proven how to non-destructively retrieve the exact time-energy structure of XFEL pulses on a single-shot basis. By using artificial intelligence techniques, in particular convolutional neural networks, we here show how this technique can be leveraged from its proof-of-principle stage toward routine diagnostics even at high-repetition-rate XFELs, thus enhancing and refining their scientific accessibility in all related disciplines.
LGAug 9, 2021
Probabilistic Active Learning for Active Class SelectionDaniel Kottke, Georg Krempl, Marianne Stecklina et al.
In machine learning, active class selection (ACS) algorithms aim to actively select a class and ask the oracle to provide an instance for that class to optimize a classifier's performance while minimizing the number of requests. In this paper, we propose a new algorithm (PAL-ACS) that transforms the ACS problem into an active learning task by introducing pseudo instances. These are used to estimate the usefulness of an upcoming instance for each class using the performance gain model from probabilistic active learning. Our experimental evaluation (on synthetic and real data) shows the advantages of our algorithm compared to state-of-the-art algorithms. It effectively prefers the sampling of difficult classes and thereby improves the classification performance.
CVJun 30, 2021
Cyclist Trajectory Forecasts by Incorporation of Multi-View Video InformationStefan Zernetsch, Oliver Trupp, Viktor Kress et al.
This article presents a novel approach to incorporate visual cues from video-data from a wide-angle stereo camera system mounted at an urban intersection into the forecast of cyclist trajectories. We extract features from image and optical flow (OF) sequences using 3D convolutional neural networks (3D-ConvNet) and combine them with features extracted from the cyclist's past trajectory to forecast future cyclist positions. By the use of additional information, we are able to improve positional accuracy by about 7.5 % for our test dataset and by up to 22 % for specific motion types compared to a method solely based on past trajectories. Furthermore, we compare the use of image sequences to the use of OF sequences as additional information, showing that OF alone leads to significant improvements in positional accuracy. By training and testing our methods using a real-world dataset recorded at a heavily frequented public intersection and evaluating the methods' runtimes, we demonstrate the applicability in real traffic scenarios. Our code and parts of our dataset are made publicly available.
CVJun 4, 2021
Pose and Semantic Map Based Probabilistic Forecast of Vulnerable Road Users' TrajectoriesViktor Kress, Fabian Jeske, Stefan Zernetsch et al.
In this article, an approach for probabilistic trajectory forecasting of vulnerable road users (VRUs) is presented, which considers past movements and the surrounding scene. Past movements are represented by 3D poses reflecting the posture and movements of individual body parts. The surrounding scene is modeled in the form of semantic maps showing, e.g., the course of streets, sidewalks, and the occurrence of obstacles. The forecasts are generated in grids discretizing the space and in the form of arbitrary discrete probability distributions. The distributions are evaluated in terms of their reliability, sharpness, and positional accuracy. We compare our method with an approach that provides forecasts in the form of Gaussian distributions and discuss the respective advantages and disadvantages. Thereby, we investigate the impact of using poses and semantic maps. With a technique called spatial label smoothing, our approach achieves reliable forecasts. Overall, the poses have a positive impact on the forecasts. The semantic maps offer the opportunity to adapt the probability distributions to the individual situation, although at the considered forecasted time horizon of 2.52 s they play a minor role compared to the past movements of the VRU. Our method is evaluated on a dataset recorded in inner-city traffic using a research vehicle. The dataset is made publicly available.
ROMay 14, 2021
Towards Sensor Data Abstraction of Autonomous Vehicle Perception SystemsHannes Reichert, Lukas Lang, Kevin Rösch et al.
Full-stack autonomous driving perception modules usually consist of data-driven models based on multiple sensor modalities. However, these models might be biased to the sensor setup used for data acquisition. This bias can seriously impair the perception models' transferability to new sensor setups, which continuously occur due to the market's competitive nature. We envision sensor data abstraction as an interface between sensor data and machine learning applications for highly automated vehicles (HAD). For this purpose, we review the primary sensor modalities, camera, lidar, and radar, published in autonomous-driving related datasets, examine single sensor abstraction and abstraction of sensor setups, and identify critical paths towards an abstraction of sensor data from multiple perception configurations.
LGMay 4, 2021
Out-of-distribution Detection and Generation using Soft Brownian Offset Sampling and AutoencodersFelix Möller, Diego Botache, Denis Huseljic et al.
Deep neural networks often suffer from overconfidence which can be partly remedied by improved out-of-distribution detection. For this purpose, we propose a novel approach that allows for the generation of out-of-distribution datasets based on a given in-distribution dataset. This new dataset can then be used to improve out-of-distribution detection for the given dataset and machine learning task at hand. The samples in this dataset are with respect to the feature space close to the in-distribution dataset and therefore realistic and plausible. Hence, this dataset can also be used to safeguard neural networks, i.e., to validate the generalization performance. Our approach first generates suitable representations of an in-distribution dataset using an autoencoder and then transforms them using our novel proposed Soft Brownian Offset method. After transformation, the decoder part of the autoencoder allows for the generation of these implicit out-of-distribution samples. This newly generated dataset then allows for mixing with other datasets and thus improved training of an out-of-distribution classifier, increasing its performance. Experimentally, we show that our approach is promising for time series using synthetic data. Using our new method, we also show in a quantitative case study that we can improve the out-of-distribution detection for the MNIST dataset. Finally, we provide another case study on the synthetic generation of out-of-distribution trajectories, which can be used to validate trajectory prediction algorithms for automated driving.
CVApr 19, 2021
Cyclist Intention Detection: A Probabilistic ApproachStefan Zernetsch, Hannes Reichert, Viktor Kress et al.
This article presents a holistic approach for probabilistic cyclist intention detection. A basic movement detection based on motion history images (MHI) and a residual convolutional neural network (ResNet) are used to estimate probabilities for the current cyclist motion state. These probabilities are used as weights in a probabilistic ensemble trajectory forecast. The ensemble consists of specialized models, which produce individual forecasts in the form of Gaussian distributions under the assumption of a certain motion state of the cyclist (e.g. cyclist is starting or turning left). By weighting the specialized models, we create forecasts in the from of Gaussian mixtures that define regions within which the cyclists will reside with a certain probability. To evaluate our method, we rate the reliability, sharpness, and positional accuracy of our forecasted distributions. We compare our method to a single model approach which produces forecasts in the form of Gaussian distributions and show that our method is able to produce more reliable and sharper outputs while retaining comparable positional accuracy. Both methods are evaluated using a dataset created at a public traffic intersection. Our code and the dataset are made publicly available.