Sebastian Müller

LG
h-index5
14papers
131citations
Novelty39%
AI Score52

14 Papers

66.9CEJun 2
Cost of Manipulation in AMM-Based Oracles

Sebastian Müller, Nordine Moumeni, Adel Messaoudi

We study the robustness of AMM-based on-chain price oracles to strategic manipulation. An attacker trades against constant product automated market makers (CPMMs) to distort an on-chain oracle, arbitrageurs restore cross-pool and cross-venue consistency, and an oracle designer chooses how to aggregate pool quotes. Taking an efficient-market-hypothesis (EMH) view of the off-chain "true" price, we define the \emph{cost of manipulation} as the minimal mark-to-market loss that an attacker must incur to move the oracle by a given multiplicative factor. For independent CPMMs, we derive closed-form single-pool manipulation formulas and solve the attacker-designer game for weighted means and weighted medians, showing that liquidity weights maximize the minimum cost of manipulation within these classes for weighted medians (for any distortion level) and, for weighted means, locally as the distortion tends to zero. For larger distortions, weighted means become more fragile: optimal weights can depend on the target distortion and no single choice is uniformly optimal across distortion levels. In a frictionless CPMM model with cross-pool arbitrage, the manipulation cost depends only on the total quote depth and coincides across symmetric aggregators. We extend this framework to multi-asset star architectures, confirming that liquidity weights remain optimal in the same sense. Finally, we bridge theory and practice by incorporating dwell times and rate limits, providing a quantitative yardstick to size oracles against the explicit economic costs of attack.

28.9PRJun 2
Stability of local tip pool sizes

Sebastian Müller, Isabel Amigo, Alexandre Reiffers-Masson et al.

In directed acyclic graph (DAG)-based distributed ledgers, unreferenced blocks (tips) form the backlog of a distributed queueing system. Each new block creates one tip and attempts to remove up to $k$ existing tips by referencing them. With heterogeneous propagation delays, these service decisions are made from delayed local information, so nodes may disagree on the backlog and some reference attempts are wasted. We study a continuous-time Poisson model with bounded heterogeneous delays and uniform tip selection. We prove that the embedded tip-configuration chain is irreducible, aperiodic, and positive Harris recurrent, and hence admits a unique stationary regime. The observer and local tip-pool sizes have stationary exponential moments, converge to their stationary limits, and satisfy almost-sure ergodic averages. We also derive a Little-type identity relating the stationary mean observer tip count to the mean time until a typical block is first referenced. Simulations are included as qualitative illustrations of the effects of delay variability and issuance heterogeneity.

LGJun 27, 2023
An Empirical Evaluation of the Rashomon Effect in Explainable Machine Learning

Sebastian Müller, Vanessa Toborek, Katharina Beckh et al.

The Rashomon Effect describes the following phenomenon: for a given dataset there may exist many models with equally good performance but with different solution strategies. The Rashomon Effect has implications for Explainable Machine Learning, especially for the comparability of explanations. We provide a unified view on three different comparison scenarios and conduct a quantitative evaluation across different datasets, models, attribution methods, and metrics. We find that hyperparameter-tuning plays a role and that metric selection matters. Our results provide empirical support for previously anecdotal evidence and exhibit challenges for both scientists and practitioners.

LGDec 18, 2025Code
INTELLECT-3: Technical Report

Prime Intellect Team, Mika Senghaas, Fares Obeid et al.

We present INTELLECT-3, a 106B-parameter Mixture-of-Experts model (12B active) trained with large-scale reinforcement learning on our end-to-end RL infrastructure stack. INTELLECT-3 achieves state of the art performance for its size across math, code, science and reasoning benchmarks, outperforming many larger frontier models. We open-source the model together with the full infrastructure stack used to create it, including RL frameworks, complete recipe, and a wide collection of environments, built with the verifiers library, for training and evaluation from our Environments Hub community platform. Built for this effort, we introduce prime-rl, an open framework for large-scale asynchronous reinforcement learning, which scales seamlessly from a single node to thousands of GPUs, and is tailored for agentic RL with first-class support for multi-turn interactions and tool use. Using this stack, we run both SFT and RL training on top of the GLM-4.5-Air-Base model, scaling RL training up to 512 H200s with high training efficiency.

OCJul 11, 2022
Multilevel Geometric Optimization for Regularised Constrained Linear Inverse Problems

Sebastian Müller, Stefania Petra, Matthias Zisler

We present a geometric multilevel optimization approach that smoothly incorporates box constraints. Given a box constrained optimization problem, we consider a hierarchy of models with varying discretization levels. Finer models are accurate but expensive to compute, while coarser models are less accurate but cheaper to compute. When working at the fine level, multilevel optimisation computes the search direction based on a coarser model which speeds up updates at the fine level. Moreover, exploiting geometry induced by the hierarchy the feasibility of the updates is preserved. In particular, our approach extends classical components of multigrid methods like restriction and prolongation to the Riemannian structure of our constraints.

17.6GTMay 8
Game-Theoretic Analysis of Transaction Selection in DAG-Based Distributed Ledgers

Sebastian Müller, Alexandre Reiffers-Masson

Transaction selection in parallel or DAG-based distributed ledger technologies (DLTs) is a crucial challenge that directly impacts throughput, fairness, and validator incentives. In these systems, validators independently choose transactions to include in their blocks, often relying on naive heuristics like uniform or proportional selection. This can lead to inefficient outcomes when validators prioritize their own rewards without considering collective impacts. We analyze two fee allocation mechanisms used in practice: Random Fee Allocation (RFA), where transaction fees are randomly assigned to one validator, and Collaborative Fee Sharing (CFS), where fees are distributed equally among all validators. Using a single-shot game-theoretic framework, we derive symmetric Nash equilibria (NE) for selecting transactions for both mechanisms and propose an optimization-based method to compute these equilibria. Numerical simulations demonstrate that the NE of CFS consistently achieves higher throughput and rewards compared to the NE of RFA, particularly under skewed fee distributions. Additionally, we compare these equilibrium strategies to naive benchmarks (uniform and proportional selection), showing that the proportional strategy outperforms the NE of RSA in many situations. These findings may provide actionable insights into the design of transaction selection and incentive mechanisms, enabling more robust and high-performance DAG-based DLTs.

LGJan 7
Improving Compactness and Reducing Ambiguity of CFIRE Rule-Based Explanations

Sebastian Müller, Tobias Schneider, Ruben Kemna et al.

Models trained on tabular data are widely used in sensitive domains, increasing the demand for explanation methods to meet transparency needs. CFIRE is a recent algorithm in this domain that constructs compact surrogate rule models from local explanations. While effective, CFIRE may assign rules associated with different classes to the same sample, introducing ambiguity. We investigate this ambiguity and propose a post-hoc pruning strategy that removes rules with low contribution or conflicting coverage, yielding smaller and less ambiguous models while preserving fidelity. Experiments across multiple datasets confirm these improvements with minimal impact on predictive performance.

LGFeb 2
Scientific Theory of a Black-Box: A Life Cycle-Scale XAI Framework Based on Constructive Empiricism

Sebastian Müller, Vanessa Toborek, Eike Stadtländer et al.

Explainable AI (XAI) offers a growing number of algorithms that aim to answer specific questions about black-box models. What is missing is a principled way to consolidate explanatory information about a fixed black-box model into a persistent, auditable artefact, that accompanies the black-box throughout its life cycle. We address this gap by introducing the notion of a scientific theory of a black (SToBB). Grounded in Constructive Empiricism, a SToBB fulfils three obligations: (i) empirical adequacy with respect to all available observations of black-box behaviour, (ii) adaptability via explicit update commitments that restore adequacy when new observations arrive, and (iii) auditability through transparent documentation of assumptions, construction choices, and update behaviour. We operationalise these obligations as a general framework that specifies an extensible observation base, a traceable hypothesis class, algorithmic components for construction and revision, and documentation sufficient for third-party assessment. Explanations for concrete stakeholder needs are then obtained by querying the maintained record through interfaces, rather than by producing isolated method outputs. As a proof of concept, we instantiate a complete SToBB for a neural-network classifier on a tabular task and introduce the Constructive Box Theoriser (CoBoT) algorithm, an online procedure that constructs and maintains an empirically adequate rule-based surrogate as observations accumulate. Together, these contributions position SToBBs as a life cycle-scale, inspectable point of reference that supports consistent, reusable analyses and systematic external scrutiny.

CLJan 4
Four Quadrants of Difficulty: A Simple Categorisation and its Limits

Vanessa Toborek, Sebastian Müller, Christian Bauckhage

Curriculum Learning (CL) aims to improve the outcome of model training by estimating the difficulty of samples and scheduling them accordingly. In NLP, difficulty is commonly approximated using task-agnostic linguistic heuristics or human intuition, implicitly assuming that these signals correlate with what neural models find difficult to learn. We propose a four-quadrant categorisation of difficulty signals -- human vs. model and task-agnostic vs. task-dependent -- and systematically analyse their interactions on a natural language understanding dataset. We find that task-agnostic features behave largely independently and that only task-dependent features align. These findings challenge common CL intuitions and highlight the need for lightweight, task-dependent difficulty estimators that better reflect model learning behaviour.

CLAug 27, 2025
Beyond Shallow Heuristics: Leveraging Human Intuition for Curriculum Learning

Vanessa Toborek, Sebastian Müller, Tim Selbach et al.

Curriculum learning (CL) aims to improve training by presenting data from "easy" to "hard", yet defining and measuring linguistic difficulty remains an open challenge. We investigate whether human-curated simple language can serve as an effective signal for CL. Using the article-level labels from the Simple Wikipedia corpus, we compare label-based curricula to competence-based strategies relying on shallow heuristics. Our experiments with a BERT-tiny model show that adding simple data alone yields no clear benefit. However, structuring it via a curriculum -- especially when introduced first -- consistently improves perplexity, particularly on simple language. In contrast, competence-based curricula lead to no consistent gains over random ordering, probably because they fail to effectively separate the two classes. Our results suggest that human intuition about linguistic difficulty can guide CL for language model pre-training.

LGApr 1, 2025
CFIRE: A General Method for Combining Local Explanations

Sebastian Müller, Vanessa Toborek, Tamás Horváth et al.

We propose a novel eXplainable AI algorithm to compute faithful, easy-to-understand, and complete global decision rules from local explanations for tabular data by combining XAI methods with closed frequent itemset mining. Our method can be used with any local explainer that indicates which dimensions are important for a given sample for a given black-box decision. This property allows our algorithm to choose among different local explainers, addressing the disagreement problem, \ie the observation that no single explanation method consistently outperforms others across models and datasets. Unlike usual experimental methodology, our evaluation also accounts for the Rashomon effect in model explainability. To this end, we demonstrate the robustness of our approach in finding suitable rules for nearly all of the 700 black-box models we considered across 14 benchmark datasets. The results also show that our method exhibits improved runtime, high precision and F1-score while generating compact and complete rules.

LGMay 21, 2021
Explainable Machine Learning with Prior Knowledge: An Overview

Katharina Beckh, Sebastian Müller, Matthias Jakobs et al.

This survey presents an overview of integrating prior knowledge into machine learning systems in order to improve explainability. The complexity of machine learning models has elicited research to make them more explainable. However, most explainability methods cannot provide insight beyond the given data, requiring additional information about the context. We propose to harness prior knowledge to improve upon the explanation capabilities of machine learning models. In this paper, we present a categorization of current research into three main categories which either integrate knowledge into the machine learning pipeline, into the explainability method or derive knowledge from explanations. To classify the papers, we build upon the existing taxonomy of informed machine learning and extend it from the perspective of explainability. We conclude with open challenges and research directions.

SEAug 3, 2020
Bet and Run for Test Case Generation

Sebastian Müller, Thomas Vogel, Lars Grunske

Anyone working in the technology sector is probably familiar with the question: "Have you tried turning it off and on again?", as this is usually the default question asked by tech support. Similarly, it is known in search based testing that metaheuristics might get trapped in a plateau during a search. As a human, one can look at the gradient of the fitness curve and decide to restart the search, so as to hopefully improve the results of the optimization with the next run. Trying to automate such a restart, it has to be programmatically decided whether the metaheuristic has encountered a plateau yet, which is an inherently difficult problem. To mitigate this problem in the context of theoretical search problems, the Bet and Run strategy was developed, where multiple algorithm instances are started concurrently, and after some time all but the single most promising instance in terms of fitness values are killed. In this paper, we adopt and evaluate the Bet and Run strategy for the problem of test case generation. Our work indicates that use of this restart strategy does not generally lead to gains in the quality metrics, when instantiated with the best parameters found in the literature.

HCFeb 3, 2020
A Survey on Human Machine Interaction in Industry 4.0

Christian Krupitzer, Sebastian Müller, Veronika Lesch et al.

Industry 4.0 or Industrial IoT both describe new paradigms for seamless interaction between humans and machines. Both concepts rely on intelligent, inter-connected cyber-physical production systems that are able to control the process flow of industrial production. As those machines take many decisions autonomously and further interact with production and manufacturing planning systems, the integration of human users requires new paradigms. In this paper, we provide an analysis of the current state-of-the-art in human-machine interaction in the Industry 4.0 domain.We focus on new paradigms that integrate the application of augmented and virtual reality technology. Based on our analysis, we further provide a discussion of research challenges.