Stefan Lüdtke

LG
h-index8
23papers
91citations
Novelty48%
AI Score50

23 Papers

89.0LGJun 1Code
TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks

Andrej Tschalzev, Nick Erickson, Yuyang Wang et al.

Progress in tabular machine learning has largely focused on increasingly sophisticated model architectures. At the same time, feature engineering remains a critical yet underexplored component of real-world modeling pipelines that is entirely absent from modern benchmarks, which creates an unquantified evaluation gap. In this work, we introduce TabPrep, a lightweight preprocessing pipeline composed of feature generators that are carefully designed to target three specific structural data patterns. We show that many widely used model classes exhibit predictable blind spots to these patterns and that systematic feature engineering alone can establish new peak performance. Across the TabArena benchmark, integrating TabPrep into model training and tuning consistently improves performance for tree-based, neural, linear, and foundation models, often surpassing gains achieved by model-centric innovations alone. TabPrep outperforms previous automated feature engineering approaches in performance, efficiency, and applicability across datasets, enabling integration into large-scale benchmarks. By releasing TabPrep (see https://github.com/atschalz/tabprep), we enable researchers to integrate feature engineering into their benchmarking setup, filling a longstanding gap in tabular evaluations.

LGJul 2, 2024Code
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data

Andrej Tschalzev, Sascha Marton, Stefan Lüdtke et al.

Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing the performance of models typically consist of model-centric evaluation setups with overly standardized data preprocessing. This paper demonstrates that such model-centric evaluations are biased, as real-world modeling pipelines often require dataset-specific preprocessing and feature engineering. Therefore, we propose a data-centric evaluation framework. We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset. We conduct experiments with different preprocessing pipelines and hyperparameter optimization (HPO) regimes to quantify the impact of model selection, HPO, feature engineering, and test-time adaptation. Our main findings are: 1. After dataset-specific feature engineering, model rankings change considerably, performance differences decrease, and the importance of model selection reduces. 2. Recent models, despite their measurable progress, still significantly benefit from manual feature engineering. This holds true for both tree-based models and neural networks. 3. While tabular data is typically considered static, samples are often collected over time, and adapting to distribution shifts can be important even in supposedly static data. These insights suggest that research efforts should be directed toward a data-centric perspective, acknowledging that tabular data requires feature engineering and often exhibits temporal characteristics. Our framework is available under: https://github.com/atschalz/dc_tabeval.

LGSep 29, 2023Code
GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data

Sascha Marton, Stefan Lüdtke, Christian Bartelt et al.

Despite the success of deep learning for text and image data, tree-based ensemble models are still state-of-the-art for machine learning with heterogeneous tabular data. However, there is a significant need for tabular-specific gradient-based methods due to their high flexibility. In this paper, we propose $\text{GRANDE}$, $\text{GRA}$die$\text{N}$t-Based $\text{D}$ecision Tree $\text{E}$nsembles, a novel approach for learning hard, axis-aligned decision tree ensembles using end-to-end gradient descent. GRANDE is based on a dense representation of tree ensembles, which affords to use backpropagation with a straight-through operator to jointly optimize all model parameters. Our method combines axis-aligned splits, which is a useful inductive bias for tabular data, with the flexibility of gradient-based optimization. Furthermore, we introduce an advanced instance-wise weighting that facilitates learning representations for both, simple and complex relations, within a single model. We conducted an extensive evaluation on a predefined benchmark with 19 classification datasets and demonstrate that our method outperforms existing gradient-boosting and deep learning frameworks on most datasets. The method is available under: https://github.com/s-marton/GRANDE

LGJun 10, 2022
Explaining Neural Networks without Access to Training Data

Sascha Marton, Stefan Lüdtke, Christian Bartelt et al.

We consider generating explanations for neural networks in cases where the network's training data is not accessible, for instance due to privacy or safety issues. Recently, $\mathcal{I}$-Nets have been proposed as a sample-free approach to post-hoc, global model interpretability that does not require access to training data. They formulate interpretation as a machine learning task that maps network representations (parameters) to a representation of an interpretable function. In this paper, we extend the $\mathcal{I}$-Net framework to the cases of standard and soft decision trees as surrogate models. We propose a suitable decision tree representation and design of the corresponding $\mathcal{I}$-Net output layers. Furthermore, we make $\mathcal{I}$-Nets applicable to real-world tasks by considering more realistic distributions when generating the $\mathcal{I}$-Net's training data. We empirically evaluate our approach against traditional global, post-hoc interpretability approaches and show that it achieves superior results when the training data is not accessible.

LGAug 16, 2024Code
Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization

Sascha Marton, Tim Grams, Florian Vogt et al.

Reinforcement learning (RL) has seen significant success across various domains, but its adoption is often limited by the black-box nature of neural network policies, making them difficult to interpret. In contrast, symbolic policies allow representing decision-making strategies in a compact and interpretable way. However, learning symbolic policies directly within on-policy methods remains challenging. In this paper, we introduce SYMPOL, a novel method for SYMbolic tree-based on-POLicy RL. SYMPOL employs a tree-based model integrated with a policy gradient method, enabling the agent to learn and adapt its actions while maintaining a high level of interpretability. We evaluate SYMPOL on a set of benchmark RL tasks, demonstrating its superiority over alternative tree-based RL approaches in terms of performance and interpretability. Unlike existing methods, it enables gradient-based, end-to-end learning of interpretable, axis-aligned decision trees within standard on-policy RL algorithms. Therefore, SYMPOL can become the foundation for a new class of interpretable RL based on decision trees. Our implementation is available under: https://github.com/s-marton/sympol

LGJul 1, 2024
Enabling Mixed Effects Neural Networks for Diverse, Clustered Data Using Monte Carlo Methods

Andrej Tschalzev, Paul Nitschke, Lukas Kirchdorfer et al.

Neural networks often assume independence among input data samples, disregarding correlations arising from inherent clustering patterns in real-world datasets (e.g., due to different sites or repeated measurements). Recently, mixed effects neural networks (MENNs) which separate cluster-specific 'random effects' from cluster-invariant 'fixed effects' have been proposed to improve generalization and interpretability for clustered data. However, existing methods only allow for approximate quantification of cluster effects and are limited to regression and binary targets with only one clustering feature. We present MC-GMENN, a novel approach employing Monte Carlo methods to train Generalized Mixed Effects Neural Networks. We empirically demonstrate that MC-GMENN outperforms existing mixed effects deep learning models in terms of generalization performance, time complexity, and quantification of inter-cluster variance. Additionally, MC-GMENN is applicable to a wide range of datasets, including multi-class classification tasks with multiple high-cardinality categorical features. For these datasets, we show that MC-GMENN outperforms conventional encoding and embedding methods, simultaneously offering a principled methodology for interpreting the effects of clustering patterns.

AIJan 25, 2023
Leveraging Planning Landmarks for Hybrid Online Goal Recognition

Nils Wilken, Lea Cohausz, Johannes Schaum et al.

Goal recognition is an important problem in many application domains (e.g., pervasive computing, intrusion detection, computer games, etc.). In many application scenarios it is important that goal recognition algorithms can recognize goals of an observed agent as fast as possible and with minimal domain knowledge. Hence, in this paper, we propose a hybrid method for online goal recognition that combines a symbolic planning landmark based approach and a data-driven goal recognition approach and evaluate it in a real-world cooking scenario. The empirical results show that the proposed method is not only significantly more efficient in terms of computation time than the state-of-the-art but also improves goal recognition performance. Furthermore, we show that the utilized planning landmark based approach, which was so far only evaluated on artificial benchmark domains, achieves also good recognition performance when applied to a real-world cooking scenario.

AIJan 13, 2023
Investigating the Combination of Planning-Based and Data-Driven Methods for Goal Recognition

Nils Wilken, Lea Cohausz, Johannes Schaum et al.

An important feature of pervasive, intelligent assistance systems is the ability to dynamically adapt to the current needs of their users. Hence, it is critical for such systems to be able to recognize those goals and needs based on observations of the user's actions and state of the environment. In this work, we investigate the application of two state-of-the-art, planning-based plan recognition approaches in a real-world setting. So far, these approaches were only evaluated in artificial settings in combination with agents that act perfectly rational. We show that such approaches have difficulties when used to recognize the goals of human subjects, because human behaviour is typically not perfectly rational. To overcome this issue, we propose an extension to the existing approaches through a classification-based method trained on observed behaviour data. We empirically show that the proposed extension not only outperforms the purely planning-based- and purely data-driven goal recognition methods but is also able to recognize the correct goal more reliably, especially when only a small number of observations were seen. This substantially improves the usefulness of hybrid goal recognition approaches for intelligent assistance systems, as recognizing a goal early opens much more possibilities for supportive reactions of the system.

LGJul 18, 2022
Outlier Explanation via Sum-Product Networks

Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

Outlier explanation is the task of identifying a set of features that distinguish a sample from normal data, which is important for downstream (human) decision-making. Existing methods are based on beam search in the space of feature subsets. They quickly becomes computationally expensive, as they require to run an outlier detection algorithm from scratch for each feature subset. To alleviate this problem, we propose a novel outlier explanation algorithm based on Sum-Product Networks (SPNs), a class of probabilistic circuits. Our approach leverages the tractability of marginal inference in SPNs to compute outlier scores in feature subsets. By using SPNs, it becomes feasible to perform backwards elimination instead of the usual forward beam search, which is less susceptible to missing relevant features in an explanation, especially when the number of features is large. We empirically show that our approach achieves state-of-the-art results for outlier explanation, outperforming recent search-based as well as deep learning-based explanation methods

LGJul 18, 2022
Discovering Behavioral Predispositions in Data to Improve Human Activity Recognition

Maximilian Popko, Sebastian Bader, Stefan Lüdtke et al.

The automatic, sensor-based assessment of challenging behavior of persons with dementia is an important task to support the selection of interventions. However, predicting behaviors like apathy and agitation is challenging due to the large inter- and intra-patient variability. Goal of this paper is to improve the recognition performance by making use of the observation that patients tend to show specific behaviors at certain times of the day or week. We propose to identify such segments of similar behavior via clustering the distributions of annotations of the time segments. All time segments within a cluster then consist of similar behaviors and thus indicate a behavioral predisposition (BPD). We utilize BPDs by training a classifier for each BPD. Empirically, we demonstrate that when the BPD per time segment is known, activity recognition performance can be substantially improved.

LGAug 7, 2023
Towards Machine Learning-based Fish Stock Assessment

Stefan Lüdtke, Maria E. Pierce

The accurate assessment of fish stocks is crucial for sustainable fisheries management. However, existing statistical stock assessment models can have low forecast performance of relevant stock parameters like recruitment or spawning stock biomass, especially in ecosystems that are changing due to global warming and other anthropogenic stressors. In this paper, we investigate the use of machine learning models to improve the estimation and forecast of such stock parameters. We propose a hybrid model that combines classical statistical stock assessment models with supervised ML, specifically gradient boosted trees. Our hybrid model leverages the initial estimate provided by the classical model and uses the ML model to make a post-hoc correction to improve accuracy. We experiment with five different stocks and find that the forecast accuracy of recruitment and spawning stock biomass improves considerably in most cases.

CLOct 29, 2025Code
MCP4IFC: IFC-Based Building Design Using Large Language Models

Bharathi Kannan Nithyanantham, Tobias Sesterhenn, Ashwin Nedungadi et al.

Bringing generative AI into the architecture, engineering and construction (AEC) field requires systems that can translate natural language instructions into actions on standardized data models. We present MCP4IFC, a comprehensive open-source framework that enables Large Language Models (LLMs) to directly manipulate Industry Foundation Classes (IFC) data through the Model Context Protocol (MCP). The framework provides a set of BIM tools, including scene querying tools for information retrieval, predefined functions for creating and modifying common building elements, and a dynamic code-generation system that combines in-context learning with retrieval-augmented generation (RAG) to handle tasks beyond the predefined toolset. Experiments demonstrate that an LLM using our framework can successfully perform complex tasks, from building a simple house to querying and editing existing IFC data. Our framework is released as open-source to encourage research in LLM-driven BIM design and provide a foundation for AI-assisted modeling workflows. Our code is available at https://show2instruct.github.io/mcp4ifc/.

LGMay 5, 2023Code
GradTree: Learning Axis-Aligned Decision Trees with Gradient Descent

Sascha Marton, Stefan Lüdtke, Christian Bartelt et al.

Decision Trees (DTs) are commonly used for many machine learning tasks due to their high degree of interpretability. However, learning a DT from data is a difficult optimization problem, as it is non-convex and non-differentiable. Therefore, common approaches learn DTs using a greedy growth algorithm that minimizes the impurity locally at each internal node. Unfortunately, this greedy procedure can lead to inaccurate trees. In this paper, we present a novel approach for learning hard, axis-aligned DTs with gradient descent. The proposed method uses backpropagation with a straight-through operator on a dense DT representation, to jointly optimize all tree parameters. Our approach outperforms existing methods on binary classification benchmarks and achieves competitive results for multi-class tasks. The method is available under: https://github.com/s-marton/GradTree

LGMar 12, 2025
Unreflected Use of Tabular Data Repositories Can Undermine Research Quality

Andrej Tschalzev, Lennart Purucker, Stefan Lüdtke et al.

Data repositories have accumulated a large number of tabular datasets from various domains. Machine Learning researchers are actively using these datasets to evaluate novel approaches. Consequently, data repositories have an important standing in tabular data research. They not only host datasets but also provide information on how to use them in supervised learning tasks. In this paper, we argue that, despite great achievements in usability, the unreflected usage of datasets from data repositories may have led to reduced research quality and scientific rigor. We present examples from prominent recent studies that illustrate the problematic use of datasets from OpenML, a large data repository for tabular data. Our illustrations help users of data repositories avoid falling into the traps of (1) using suboptimal model selection strategies, (2) overlooking strong baselines, and (3) inappropriate preprocessing. In response, we discuss possible solutions for how data repositories can prevent the inappropriate use of datasets and become the cornerstones for improved overall quality of empirical research studies.

LGJan 15, 2025
Disentangling Exploration of Large Language Models by Optimal Exploitation

Tim Grams, Patrick Betz, Sascha Marton et al.

Exploration is a crucial skill for in-context reinforcement learning in unknown environments. However, it remains unclear if large language models can effectively explore a partially hidden state space. This work isolates exploration as the sole objective, tasking an agent with gathering information that enhances future returns. Within this framework, we argue that measuring agent returns is not sufficient for a fair evaluation. Hence, we decompose missing rewards into their exploration and exploitation components based on the optimal achievable return. Experiments with various models reveal that most struggle to explore the state space, and weak exploration is insufficient. Nevertheless, we found a positive correlation between exploration performance and reasoning capabilities. Our decomposition can provide insights into differences in behaviors driven by prompt engineering, offering a valuable tool for refining performance in exploratory tasks.

LGNov 27, 2025
Where to Measure: Epistemic Uncertainty-Based Sensor Placement with ConvCNPs

Feyza Eksen, Stefan Oehmcke, Stefan Lüdtke

Accurate sensor placement is critical for modeling spatio-temporal systems such as environmental and climate processes. Neural Processes (NPs), particularly Convolutional Conditional Neural Processes (ConvCNPs), provide scalable probabilistic models with uncertainty estimates, making them well-suited for data-driven sensor placement. However, existing approaches rely on total predictive uncertainty, which conflates epistemic and aleatoric components, that may lead to suboptimal sensor selection in ambiguous regions. To address this, we propose expected reduction in epistemic uncertainty as a new acquisition function for sensor placement. To enable this, we extend ConvCNPs with a Mixture Density Networks (MDNs) output head for epistemic uncertainty estimation. Preliminary results suggest that epistemic uncertainty driven sensor placement more effectively reduces model error than approaches based on overall uncertainty.

AISep 1, 2023
On the Aggregation of Rules for Knowledge Graph Completion

Patrick Betz, Stefan Lüdtke, Christian Meilicke et al.

Rule learning approaches for knowledge graph completion are efficient, interpretable and competitive to purely neural models. The rule aggregation problem is concerned with finding one plausibility score for a candidate fact which was simultaneously predicted by multiple rules. Although the problem is ubiquitous, as data-driven rule learning can result in noisy and large rulesets, it is underrepresented in the literature and its theoretical foundations have not been studied before in this context. In this work, we demonstrate that existing aggregation approaches can be expressed as marginal inference operations over the predicting rules. In particular, we show that the common Max-aggregation strategy, which scores candidates based on the rule with the highest confidence, has a probabilistic interpretation. Finally, we propose an efficient and overlooked baseline which combines the previous strategies and is competitive to computationally more expensive approaches.

AIFeb 1, 2022
Activity Recognition in Assembly Tasks by Bayesian Filtering in Multi-Hypergraphs

Timon Felske, Stefan Lüdtke, Sebastian Bader et al.

We study sensor-based human activity recognition in manual work processes like assembly tasks. In such processes, the system states often have a rich structure, involving object properties and relations. Thus, estimating the hidden system state from sensor observations by recursive Bayesian filtering can be very challenging, due to the combinatorial explosion in the number of system states. To alleviate this problem, we propose an efficient Bayesian filtering model for such processes. In our approach, system states are represented by multi-hypergraphs, and the system dynamics is modeled by graph rewriting rules. We show a preliminary concept that allows to represent distributions over multi-hypergraphs more compactly than by full enumeration, and present an inference algorithm that works directly on this compact representation. We demonstrate the applicability of the algorithm on a real dataset.

SPOct 28, 2021
Human Activity Recognition using Attribute-Based Neural Networks and Context Information

Stefan Lüdtke, Fernando Moya Rueda, Waqas Ahmed et al.

We consider human activity recognition (HAR) from wearable sensor data in manual-work processes, like warehouse order-picking. Such structured domains can often be partitioned into distinct process steps, e.g., packaging or transporting. Each process step can have a different prior distribution over activity classes, e.g., standing or walking, and different system dynamics. Here, we show how such context information can be integrated systematically into a deep neural network-based HAR system. Specifically, we propose a hybrid architecture that combines a deep neural network-that estimates high-level movement descriptors, attributes, from the raw-sensor data-and a shallow classifier, which predicts activity classes from the estimated attributes and (optional) context information, like the currently executed process step. We empirically show that our proposed architecture increases HAR performance, compared to state-of-the-art methods. Additionally, we show that HAR performance can be further increased when information about process steps is incorporated, even when that information is only partially correct.

LGOct 11, 2021
Exchangeability-Aware Sum-Product Networks

Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

Sum-Product Networks (SPNs) are expressive probabilistic models that provide exact, tractable inference. They achieve this efficiency by making use of local independence. On the other hand, mixtures of exchangeable variable models (MEVMs) are a class of tractable probabilistic models that make use of exchangeability of discrete random variables to render inference tractable. Exchangeability, which arises naturally in relational domains, has not been considered for efficient representation and inference in SPNs yet. The contribution of this paper is a novel probabilistic model which we call Exchangeability-Aware Sum-Product Networks (XSPNs). It contains both SPNs and MEVMs as special cases, and combines the ability of SPNs to efficiently learn deep probabilistic models with the ability of MEVMs to efficiently handle exchangeable random variables. We introduce a structure learning algorithm for XSPNs and empirically show that they can be more accurate than conventional SPNs when the data contains repeated, interchangeable parts.

AIApr 18, 2018
State-Space Abstractions for Probabilistic Inference: A Systematic Review

Stefan Lüdtke, Max Schröder, Frank Krüger et al.

Tasks such as social network analysis, human behavior recognition, or modeling biochemical reactions, can be solved elegantly by using the probabilistic inference framework. However, standard probabilistic inference algorithms work at a propositional level, and thus cannot capture the symmetries and redundancies that are present in these tasks. Algorithms that exploit those symmetries have been devised in different research fields, for example by the lifted inference-, multiple object tracking-, and modeling and simulation-communities. The common idea, that we call state space abstraction, is to perform inference over compact representations of sets of symmetric states. Although they are concerned with a similar topic, the relationship between these approaches has not been investigated systematically. This survey provides the following contributions. We perform a systematic literature review to outline the state of the art in probabilistic inference methods exploiting symmetries. From an initial set of more than 4,000 papers, we identify 116 relevant papers. Furthermore, we provide new high-level categories that classify the approaches, based on common properties of the approaches. The research areas underlying each of the categories are introduced concisely. Researchers from different fields that are confronted with a state space explosion problem in a probabilistic system can use this classification to identify possible solutions. Finally, based on this conceptualization, we identify potentials for future research, as some relevant application domains are not addressed by current approaches.

AIJan 31, 2018
Lifted Filtering via Exchangeable Decomposition

Stefan Lüdtke, Max Schröder, Sebastian Bader et al.

We present a model for exact recursive Bayesian filtering based on lifted multiset states. Combining multisets with lifting makes it possible to simultaneously exploit multiple strategies for reducing inference complexity when compared to list-based grounded state representations. The core idea is to borrow the concept of Maximally Parallel Multiset Rewriting Systems and to enhance it by concepts from Rao-Blackwellization and Lifted Inference, giving a representation of state distributions that enables efficient inference. In worlds where the random variables that define the system state are exchangeable -- where the identity of entities does not matter -- it automatically uses a representation that abstracts from ordering (achieving an exponential reduction in complexity) -- and it automatically adapts when observations or system dynamics destroy exchangeability by breaking symmetry.

AIJul 20, 2017
Sequential Lifted Bayesian Filtering in Multiset Rewriting Systems

Max Schröder, Stefan Lüdtke, Sebastian Bader et al.

Bayesian Filtering for plan and activity recognition is challenging for scenarios that contain many observation equivalent entities (i.e. entities that produce the same observations). This is due to the combinatorial explosion in the number of hypotheses that need to be tracked. However, this class of problems exhibits a certain symmetry that can be exploited for state space representation and inference. We analyze current state of the art methods and find that none of them completely fits the requirements arising in this problem class. We sketch a novel inference algorithm that provides a solution by incorporating concepts from Lifted Inference algorithms, Probabilistic Multiset Rewriting Systems, and Computational State Space Models. Two experiments confirm that this novel algorithm has the potential to perform efficient probabilistic inference on this problem class.