Sebastian Fischer

CL
h-index68
8papers
20citations
Novelty47%
AI Score49

8 Papers

MLSep 27, 2024
Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study

Hannah Schulz-Kümpel, Sebastian Fischer, Roman Hornung et al.

When assessing the quality of prediction models in machine learning, confidence intervals (CIs) for the generalization error, which measures predictive performance, are a crucial tool. Luckily, there exist many methods for computing such CIs and new promising approaches are continuously being proposed. Typically, these methods combine various resampling procedures, most popular among them cross-validation and bootstrapping, with different variance estimation techniques. Unfortunately, however, there is currently no consensus on when any of these combinations may be most reliably employed and how they generally compare. In this work, we conduct a large-scale study comparing CIs for the generalization error, the first one of such size, where we empirically evaluate 13 different CI methods on a total of 19 tabular regression and classification problems, using seven different inducers and a total of eight loss functions. We give an overview of the methodological foundations and inherent challenges of constructing CIs for the generalization error and provide a concise review of all 13 methods in a unified framework. Finally, the CI methods are evaluated in terms of their relative coverage frequency, width, and runtime. Based on these findings, we can identify a subset of methods that we would recommend. We also publish the datasets as a benchmarking suite on OpenML and our code on GitHub to serve as a basis for further studies.

20.2MLApr 20
mlr3torch: A Deep Learning Framework in R based on mlr3 and torch

Sebastian Fischer, Lukas Burk, Carson Zhang et al.

Deep learning (DL) has become a cornerstone of modern machine learning (ML) praxis. We introduce the R package mlr3torch, which is an extensible DL framework for the mlr3 ecosystem. It is built upon the torch package, and simplifies the definition, training, and evaluation of neural networks for both tabular data and generic tensors (e.g., images) for classification and regression. The package implements predefined architectures, and torch models can easily be converted to mlr3 learners. It also allows users to define neural networks as graphs. This representation is based on the graph language defined in mlr3pipelines and allows users to define the entire modeling workflow, including preprocessing, data augmentation, and network architecture, in a single graph. Through its integration into the mlr3 ecosystem, the package allows for convenient resampling, benchmarking, preprocessing, and more. We explain the package's design and features and show how to customize and extend it to new problems. Furthermore, we demonstrate the package's capabilities using three use cases, namely hyperparameter tuning, fine-tuning, and defining architectures for multimodal data. Finally, we present some runtime benchmarks.

CLMar 6, 2024Code
GPTopic: Dynamic and Interactive Topic Representations

Arik Reuter, Bishnu Khadka, Anton Thielmann et al.

Topic modeling seems to be almost synonymous with generating lists of top words to represent topics within large text corpora. However, deducing a topic from such list of individual terms can require substantial expertise and experience, making topic modelling less accessible to people unfamiliar with the particularities and pitfalls of top-word interpretation. A topic representation limited to top-words might further fall short of offering a comprehensive and easily accessible characterization of the various aspects, facets and nuances a topic might have. To address these challenges, we introduce GPTopic, a software package that leverages Large Language Models (LLMs) to create dynamic, interactive topic representations. GPTopic provides an intuitive chat interface for users to explore, analyze, and refine topics interactively, making topic modeling more accessible and comprehensive. The corresponding code is available here: https://github.com/ArikReuter/TopicGPT.

40.6ARMar 23
Quantifying Uncertainty in FMEDA Safety Metrics: An Error Propagation Approach for Enhanced ASIC Verification

Antonino Armato, Christian Kehl, Sebastian Fischer

Accurate and reliable safety metrics are paramount for functional safety verification of ASICs in automotive systems. Traditional FMEDA (Failure Modes, Effects, and Diagnostic Analysis) metrics, such as SPFM (Single Point Fault Metric) and LFM (Latent Fault Metric), depend on the precision of failure mode distribution (FMD) and diagnostic coverage (DC) estimations. This reliance can often leads to significant, unquantified uncertainties and a dependency on expert judgment, compromising the quality of the safety analysis. This paper proposes a novel approach that introduces error propagation theory into the calculation of FMEDA safety metrics. By quantifying the maximum deviation and providing confidence intervals for SPFM and LFM, our method offers a direct measure of analysis quality. Furthermore, we introduce an Error Importance Identifier (EII) to pinpoint the primary sources of uncertainty, guiding targeted improvements. This approach significantly enhances the transparency and trustworthiness of FMEDA, enabling more robust ASIC safety verification for ISO 26262 compliance, addressing a longstanding open question in the functional safety community.

GNSep 2, 2025Code
Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data

John Zobolas, Anne-Marie George, Alberto López et al.

Prediction of patient survival using high-dimensional multi-omics data requires systematic feature selection methods that ensure predictive performance, sparsity, and reliability for prognostic biomarker discovery. We developed a hybrid ensemble feature selection (hEFS) approach that combines data subsampling with multiple prognostic models, integrating both embedded and wrapper-based strategies for survival prediction. Omics features are ranked using a voting-theory-inspired aggregation mechanism across models and subsamples, while the optimal number of features is selected via a Pareto front, balancing predictive accuracy and model sparsity without any user-defined thresholds. When applied to multi-omics datasets from three pancreatic cancer cohorts, hEFS identifies significantly fewer and more stable biomarkers compared to the conventional, late-fusion CoxLasso models, while maintaining comparable discrimination performance. Implemented within the open-source mlr3fselect R package, hEFS offers a robust, interpretable, and clinically valuable tool for prognostic modelling and biomarker discovery in high-dimensional survival settings.

CRMar 6
An Integrated Failure and Threat Mode and Effect Analysis (FTMEA) Framework with Quantified Cross-Domain Correlation Factors for Automotive Semiconductors

Antonino Armato, Marzana Khatun, Sebastian Fischer

The automotive industry faces increasing challenges in ensuring both functional safety (FuSa) and cybersecurity for complex semiconductor devices. Traditional Failure Mode and Effects Analysis (FMEA) primarily addresses safety-related failure modes, often overlooking synergistic vulnerabilities and shared consequences with cybersecurity threats. This paper introduces an Integrated Failure and Threat Mode and Effect Analysis (FTMEA) framework that systematically co-analyzes FuSa and cybersecurity. A cornerstone of this framework is the introduction of rigorously defined Cross-Domain Correlation Factors (CDCFs), which quantify the interdependencies and mutual influences between safety-related failures and cybersecurity threats. These factors are derived from a combination of structured expert knowledge, static structural analysis metrics (e.g., Controllability/Observability), and validated against empirical data from fault/attack injection campaigns. We propose a modified Risk Priority Number (RPN) calculation that systematically integrates these correlation factors, enabling a more accurate and transparent prioritization of risks that span both domains. A detailed case study involving an automotive ASIC configuration register proves the practical application of the FTMEA. We present explicit mapping tables, quantitative CDCF values, and a comparative analysis against a baseline FMEA/TARA (Threat Analysis and Risk Assessment), illustrating how the integrated approach uncovers previously masked cross-domain risks, improves mitigation strategy effectiveness, and provides a clear quantitative justification for the derived correlation values. This framework offers a unified, traceable, methodology for risk assessment in critical automotive systems, thereby overcoming the limitations of conventional analyses and promoting optimized, cross-disciplinary development.

LGDec 23, 2020
BENN: Bias Estimation Using Deep Neural Network

Amit Giloni, Edita Grolman, Tanja Hagemann et al.

The need to detect bias in machine learning (ML) models has led to the development of multiple bias detection methods, yet utilizing them is challenging since each method: i) explores a different ethical aspect of bias, which may result in contradictory output among the different methods, ii) provides an output of a different range/scale and therefore, can't be compared with other methods, and iii) requires different input, and therefore a human expert needs to be involved to adjust each method according to the examined model. In this paper, we present BENN -- a novel bias estimation method that uses a pretrained unsupervised deep neural network. Given a ML model and data samples, BENN provides a bias estimation for every feature based on the model's predictions. We evaluated BENN using three benchmark datasets and one proprietary churn prediction model used by a European Telco and compared it with an ensemble of 21 existing bias estimation methods. Evaluation results highlight the significant advantages of BENN over the ensemble, as it is generic (i.e., can be applied to any ML model) and there is no need for a domain expert, yet it provides bias estimations that are aligned with those of the ensemble.

CLMar 25, 2018
Pay More Attention - Neural Architectures for Question-Answering

Zia Hasan, Sebastian Fischer

Machine comprehension is a representative task of natural language understanding. Typically, we are given context paragraph and the objective is to answer a question that depends on the context. Such a problem requires to model the complex interactions between the context paragraph and the question. Lately, attention mechanisms have been found to be quite successful at these tasks and in particular, attention mechanisms with attention flow from both context-to-question and question-to-context have been proven to be quite useful. In this paper, we study two state-of-the-art attention mechanisms called Bi-Directional Attention Flow (BiDAF) and Dynamic Co-Attention Network (DCN) and propose a hybrid scheme combining these two architectures that gives better overall performance. Moreover, we also suggest a new simpler attention mechanism that we call Double Cross Attention (DCA) that provides better results compared to both BiDAF and Co-Attention mechanisms while providing similar performance as the hybrid scheme. The objective of our paper is to focus particularly on the attention layer and to suggest improvements on that. Our experimental evaluations show that both our proposed models achieve superior results on the Stanford Question Answering Dataset (SQuAD) compared to BiDAF and DCN attention mechanisms.