Sebastian Weichwald

ML
h-index9
24papers
985citations
Novelty42%
AI Score56

24 Papers

MLMar 31, 2023
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models

Alexander G. Reisach, Myriam Tami, Christof Seiler et al.

Additive Noise Models (ANMs) are a common model class for causal discovery from observational data and are often used to generate synthetic data for causal discovery benchmarking. Specifying an ANM requires choosing all parameters, including those not fixed by explicit assumptions. Reisach et al. (2021) show that sorting variables by increasing variance often yields an ordering close to a causal order and introduce var-sortability to quantify this alignment. Since increasing variances may be unrealistic and are scale-dependent, ANM data are often standardized in benchmarks. We show that synthetic ANM data are characterized by another pattern that is scale-invariant: the explainable fraction of a variable's variance, as captured by the coefficient of determination $R^2$, tends to increase along the causal order. The result is high $R^2$-sortability, meaning that sorting the variables by increasing $R^2$ yields an ordering close to a causal order. We propose an efficient baseline algorithm termed $R^2$-SortnRegress that exploits high $R^2$-sortability and that can match and exceed the performance of established causal discovery algorithms. We show analytically that sufficiently high edge weights lead to a relative decrease of the noise contributions along causal chains, resulting in increasingly deterministic relationships and high $R^2$. We characterize $R^2$-sortability for different simulation parameters and find high values in common settings. Our findings reveal high $R^2$-sortability as an assumption about the data generating process relevant to causal discovery and implicit in many ANM sampling schemes. It should be made explicit, as its prevalence in real-world data is unknown. For causal discovery benchmarking, we implement $R^2$-sortability, the $R^2$-SortnRegress algorithm, and ANM simulation procedures in our library CausalDisco at https://causaldisco.github.io/CausalDisco/.

66.0AIApr 30
Causal Foundations of Collective Agency

Frederik Hytting Jørgensen, Sebastian Weichwald, Lewis Hammond

A key challenge for the safety of advanced AI systems is the possibility that multiple simpler agents might inadvertently form a collective agent with capabilities and goals distinct from those of any individual. More generally, determining when a group of agents can be viewed as a unified collective agent is a foundational question in the study of interactions and incentives in both biological and artificial systems. We adopt a behavioral perspective in answering this question, ascribing collective agency to a group when viewing the group's joint actions as rational and goal-directed successfully predicts its behavior. We formalize this perspective on collective agency using causal games -- which are causal models of strategic, multi-agent interactions -- and causal abstraction -- which formalizes when a simple, high-level model faithfully captures a more complex, low-level model. We use this framework to solve a puzzle regarding multi-agent incentives in actor-critic models and to make quantitative assessments of the degree of collective agency exhibited by different voting mechanisms. Our framework aims to provide a foundation for theoretical and empirical work to understand, predict, and control emergent collective agents in multi-agent AI systems.

MLJun 1, 2023
Unfair Utilities and First Steps Towards Improving Them

Frederik Hytting Jørgensen, Sebastian Weichwald, Jonas Peters

Many fairness criteria constrain the policy or choice of predictors, which can have unwanted consequences, in particular, when optimizing the policy under such constraints. Here, we advocate to instead focus on the utility function the policy is optimizing for. We define value of information fairness and propose to not use utility functions that violate this criterion. This principle suggests to modify these utility functions such that they satisfy value of information fairness. We describe how this can be done and discuss consequences for the corresponding optimal policies. We apply our framework to thought experiments and the COMPAS data. Focussing on the utility function provides better answers than existing fairness notions: We are not aware of any intuitively fair policy that is disallowed by value of information fairness, and when we find that value of information fairness recommends an intuitively unfair policy, no existing fairness notion finds an intuitively fair policy.

MLFeb 13, 2024Code
Adjustment Identification Distance: A gadjid for Causal Structure Learning

Leonard Henckel, Theo Würtzen, Sebastian Weichwald

Evaluating graphs learned by causal discovery algorithms is difficult: The number of edges that differ between two graphs does not reflect how the graphs differ with respect to the identifying formulas they suggest for causal effects. We introduce a framework for developing causal distances between graphs which includes the structural intervention distance for directed acyclic graphs as a special case. We use this framework to develop improved adjustment-based distances as well as extensions to completed partially directed acyclic graphs and causal orders. We develop new reachability algorithms to compute the distances efficiently and to prove their low polynomial time complexity. In our package gadjid (open source at https://github.com/CausalDisco/gadjid), we provide implementations of our distances; they are orders of magnitude faster with proven lower time complexity than the structural intervention distance and thereby provide a success metric for causal discovery that scales to graph sizes that were previously prohibitive.

AIJun 18, 2025Code
Linear-Time Primitives for Algorithm Development in Graphical Causal Inference

Marcel Wienöbst, Sebastian Weichwald, Leonard Henckel

We introduce CIfly, a framework for efficient algorithmic primitives in graphical causal inference that isolates reachability as a reusable core operation. It builds on the insight that many causal reasoning tasks can be reduced to reachability in purpose-built state-space graphs that can be constructed on the fly during traversal. We formalize a rule table schema for specifying such algorithms and prove they run in linear time. We establish CIfly as a more efficient alternative to the common primitives moralization and latent projection, which we show are computationally equivalent to Boolean matrix multiplication. Our open-source Rust implementation parses rule table text files and runs the specified CIfly algorithms providing high-performance execution accessible from Python and R. We demonstrate CIfly's utility by re-implementing a range of established causal inference tasks within the framework and by developing new algorithms for instrumental variables. These contributions position CIfly as a flexible and scalable backbone for graphical causal inference, guiding algorithm development and enabling easy and efficient deployment.

LGFeb 12, 2022Code
Learning by Doing: Controlling a Dynamical System using Causality, Control, and Reinforcement Learning

Sebastian Weichwald, Søren Wengel Mogensen, Tabitha Edith Lee et al.

Questions in causality, control, and reinforcement learning go beyond the classical machine learning task of prediction under i.i.d. observations. Instead, these fields consider the problem of learning how to actively perturb a system to achieve a certain effect on a response variable. Arguably, they have complementary views on the problem: In control, one usually aims to first identify the system by excitation strategies to then apply model-based design techniques to control the system. In (non-model-based) reinforcement learning, one directly optimizes a reward. In causality, one focus is on identifiability of causal structure. We believe that combining the different views might create synergies and this competition is meant as a first step toward such synergies. The participants had access to observational and (offline) interventional data generated by dynamical systems. Track CHEM considers an open-loop problem in which a single impulse at the beginning of the dynamics can be set, while Track ROBO considers a closed-loop problem in which control variables can be set at each time step. The goal in both tracks is to infer controls that drive the system to a desired state. Code is open-sourced ( https://github.com/LearningByDoingCompetition/learningbydoing-comp ) to reproduce the winning solutions of the competition and to facilitate trying out new methods on the competition tasks.

MLFeb 26, 2021Code
Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy To Game

Alexander G. Reisach, Christof Seiler, Sebastian Weichwald

Simulated DAG models may exhibit properties that, perhaps inadvertently, render their structure identifiable and unexpectedly affect structure learning algorithms. Here, we show that marginal variance tends to increase along the causal order for generically sampled additive noise models. We introduce varsortability as a measure of the agreement between the order of increasing marginal variance and the causal order. For commonly sampled graphs and model parameters, we show that the remarkable performance of some continuous structure learning algorithms can be explained by high varsortability and matched by a simple baseline method. Yet, this performance may not transfer to real-world data where varsortability may be moderate or dependent on the choice of measurement scales. On standardized data, the same algorithms fail to identify the ground-truth DAG or its Markov equivalence class. While standardization removes the pattern in marginal variance, we show that data generating processes that incur high varsortability also leave a distinct covariance pattern that may be exploited even after standardization. Our findings challenge the significance of generic benchmarks with independently drawn parameters. The code is available at https://github.com/Scriddie/Varsortability.

MLFeb 21, 2020Code
Causal structure learning from time series: Large regression coefficients may predict causal links better in practice than small p-values

Sebastian Weichwald, Martin E Jakobsen, Phillip B Mogensen et al.

In this article, we describe the algorithms for causal structure learning from time series data that won the Causality 4 Climate competition at the Conference on Neural Information Processing Systems 2019 (NeurIPS). We examine how our combination of established ideas achieves competitive performance on semi-realistic and realistic time series data exhibiting common challenges in real-world Earth sciences data. In particular, we discuss a) a rationale for leveraging linear methods to identify causal links in non-linear systems, b) a simulation-backed explanation as to why large regression coefficients may predict causal links better in practice than small p-values and thus why normalising the data may sometimes hinder causal structure learning. For benchmark usage, we detail the algorithms here and provide implementations at https://github.com/sweichwald/tidybench . We propose the presented competition-proven methods for baseline benchmark comparisons to guide the development of novel algorithms for structure learning from time series.

MEDec 3, 2015Code
MERLiN: Mixture Effect Recovery in Linear Networks

Sebastian Weichwald, Moritz Grosse-Wentrup, Arthur Gretton

Causal inference concerns the identification of cause-effect relationships between variables, e.g. establishing whether a stimulus affects activity in a certain brain region. The observed variables themselves often do not constitute meaningful causal variables, however, and linear combinations need to be considered. In electroencephalographic studies, for example, one is not interested in establishing cause-effect relationships between electrode signals (the observed variables), but rather between cortical signals (the causal variables) which can be recovered as linear combinations of electrode signals. We introduce MERLiN (Mixture Effect Recovery in Linear Networks), a family of causal inference algorithms that implement a novel means of constructing causal variables from non-causal variables. We demonstrate through application to EEG data how the basic MERLiN algorithm can be extended for application to different (neuroimaging) data modalities. Given an observed linear mixture, the algorithms can recover a causal variable that is a linear effect of another given variable. That is, MERLiN allows us to recover a cortical signal that is affected by activity in a certain brain region, while not being a direct effect of the stimulus. The Python/Matlab implementation for all presented algorithms is available on https://github.com/sweichwald/MERLiN

21.1MLApr 10
Identifying Causal Effects Using a Single Proxy Variable

Silvan Vollmer, Niklas Pfister, Sebastian Weichwald

Unobserved confounding is a key challenge when estimating causal effects from a treatment on an outcome in scientific applications. In this work, we assume that we observe a single, potentially multi-dimensional proxy variable of the unobserved confounder and that we know the mechanism that generates the proxy from the confounder. Under a completeness assumption on this mechanism, which we call Single Proxy Identifiability of Causal Effects or simply SPICE, we prove that causal effects are identifiable. We extend the proxy-based causal identifiability results by Kuroki and Pearl (2014); Pearl (2010) to higher dimensions, more flexible functional relationships and a broader class of distributions. Further, we develop a neural network based estimation framework, SPICE-Net, to estimate causal effects, which is applicable to both discrete and continuous treatments.

78.1MEMay 7
A Topological Sorting Criterion for Random Causal Directed Acyclic Graphs

Alexander G. Reisach, Antoine Chambaz, Gilles Blanchard et al.

Random directed acyclic graphs (DAGs) based on imposing an order on Erdős-Rényi and scale free random graphs are widely used for evaluating causal discovery algorithms. We show that in such DAGs, the set of nodes reachable via open paths, termed relatives, increases monotonically along the causal order. We assess the prevalence of this pattern numerically, and demonstrate that it can be exploited for causal order recovery via sorting by the estimated number of relatives. We note that many simulations in the literature feature settings where this yields an excellent proxy for the causal order, and show that a strict increase of relatives along the causal order leads to a singular Markov equivalence class. We propose sampling time-series DAGs as a possible alternative and discuss implications for causal discovery algorithms and their evaluation on synthetic data.

MLOct 30, 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling

Emanuele Marconato, Sébastien Lachapelle, Sebastian Weichwald et al.

We analyze identifiability as a possible explanation for the ubiquity of linear properties across language models, such as the vector difference between the representations of "easy" and "easiest" being parallel to that between "lucky" and "luckiest". For this, we ask whether finding a linear property in one model implies that any model that induces the same distribution has that property, too. To answer that, we first prove an identifiability result to characterize distribution-equivalent next-token predictors, lifting a diversity requirement of previous results. Second, based on a refinement of relational linearity [Paccanaro and Hinton, 2001; Hernandez et al., 2024], we show how many notions of linearity are amenable to our analysis. Finally, we show that under suitable conditions, these linear properties either hold in all or none distribution-equivalent next-token predictors.

MLJan 31, 2025
What is causal about causal models and representations?

Frederik Hytting Jørgensen, Luigi Gresele, Sebastian Weichwald

Causal Bayesian networks are 'causal' models since they make predictions about interventional distributions. To connect such causal model predictions to real-world outcomes, we must determine which actions in the world correspond to which interventions in the model. For example, to interpret an action as an intervention on a treatment variable, the action will presumably have to a) change the distribution of treatment in a way that corresponds to the intervention, and b) not change other aspects, such as how the outcome depends on the treatment; while the marginal distributions of some variables may change as an effect. We introduce a formal framework to make such requirements for different interpretations of actions as interventions precise. We prove that the seemingly natural interpretation of actions as interventions is circular: Under this interpretation, every causal Bayesian network that correctly models the observational distribution is trivially also interventionally valid, and no action yields empirical data that could possibly falsify such a model. We prove an impossibility result: No interpretation exists that is non-circular and simultaneously satisfies a set of natural desiderata. Instead, we examine non-circular interpretations that may violate some desiderata and show how this may in turn enable the falsification of causal models. By rigorously examining how a causal Bayesian network could be a 'causal' model of the world instead of merely a mathematical object, our formal framework contributes to the conceptual foundations of causal representation learning, causal discovery, and causal abstraction, while also highlighting some limitations of existing approaches.

MLOct 6, 2025
Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning

Marcel Wienöbst, Leonard Henckel, Sebastian Weichwald

We present FLOP (Fast Learning of Order and Parents), a score-based causal discovery algorithm for linear models. It pairs fast parent selection with iterative Cholesky-based score updates, cutting run-times over prior algorithms. This makes it feasible to fully embrace discrete search, enabling iterated local search with principled order initialization to find graphs with scores at or close to the global optimum. The resulting structures are highly accurate across benchmarks, with near-perfect recovery in standard settings. This performance calls for revisiting discrete search over graphs as a reasonable approach to causal discovery.

MLMar 29, 2021
Compositional Abstraction Error and a Category of Causal Models

Eigil F. Rischel, Sebastian Weichwald

Interventional causal models describe several joint distributions over some variables used to describe a system, one for each intervention setting. They provide a formal recipe for how to move between the different joint distributions and make predictions about the variables upon intervening on the system. Yet, it is difficult to formalise how we may change the underlying variables used to describe the system, say moving from fine-grained to coarse-grained variables. Here, we argue that compositionality is a desideratum for such model transformations and the associated errors: When abstracting a reference model M iteratively, first obtaining M' and then further simplifying that to obtain M'', we expect the composite transformation from M to M'' to exist and its error to be bounded by the errors incurred by each individual transformation step. Category theory, the study of mathematical objects via compositional transformations between them, offers a natural language to develop our framework for model transformations and abstractions. We introduce a category of finite interventional causal models and, leveraging theory of enriched categories, prove the desired compositionality properties for our framework.

MLJun 4, 2018
Robustifying Independent Component Analysis by Adjusting for Group-Wise Stationary Noise

Niklas Pfister, Sebastian Weichwald, Peter Bühlmann et al.

We introduce coroICA, confounding-robust independent component analysis, a novel ICA algorithm which decomposes linearly mixed multivariate observations into independent components that are corrupted (and rendered dependent) by hidden group-wise stationary confounding. It extends the ordinary ICA model in a theoretically sound and explicit way to incorporate group-wise (or environment-wise) confounding. We show that our proposed general noise model allows to perform ICA in settings where other noisy ICA procedures fail. Additionally, it can be used for applications with grouped data by adjusting for different stationary noise within each group. Our proposed noise model has a natural relation to causality and we explain how it can be applied in the context of causal inference. In addition to our theoretical framework, we provide an efficient estimation procedure and prove identifiability of the unmixing matrix under mild assumptions. Finally, we illustrate the performance and robustness of our method on simulated data, provide audible and visual examples, and demonstrate the applicability to real-world scenarios by experiments on publicly available Antarctic ice core data as well as two EEG data sets. We provide a scikit-learn compatible pip-installable Python package coroICA as well as R and Matlab implementations accompanied by a documentation at https://sweichwald.de/coroICA/

MLJul 4, 2017
Causal Consistency of Structural Equation Models

Paul K. Rubenstein, Sebastian Weichwald, Stephan Bongers et al.

Complex systems can be modelled at various levels of detail. Ideally, causal models of the same system should be consistent with one another in the sense that they agree in their predictions of the effects of interventions. We formalise this notion of consistency in the case of Structural Equation Models (SEMs) by introducing exact transformations between SEMs. This provides a general language to consider, for instance, the different levels of description in the following three scenarios: (a) models with large numbers of variables versus models in which the `irrelevant' or unobservable variables have been marginalised out; (b) micro-level models versus macro-level models in which the macro-variables are aggregate features of the micro-variables; (c) dynamical time series models versus models of their stationary behaviour. Our analysis stresses the importance of well specified interventions in the causal modelling process and sheds light on the interpretation of cyclic SEMs.

HCMay 9, 2017
Personalized Brain-Computer Interface Models for Motor Rehabilitation

Anastasia-Atalanti Mastakouri, Sebastian Weichwald, Ozan Özdenizci et al.

We propose to fuse two currently separate research lines on novel therapies for stroke rehabilitation: brain-computer interface (BCI) training and transcranial electrical stimulation (TES). Specifically, we show that BCI technology can be used to learn personalized decoding models that relate the global configuration of brain rhythms in individual subjects (as measured by EEG) to their motor performance during 3D reaching movements. We demonstrate that our models capture substantial across-subject heterogeneity, and argue that this heterogeneity is a likely cause of limited effect sizes observed in TES for enhancing motor performance. We conclude by discussing how our personalized models can be used to derive optimal TES parameters, e.g., stimulation site and frequency, for individual patients.

NCMay 23, 2016
A note on the expected minimum error probability in equientropic channels

Sebastian Weichwald, Tatiana Fomina, Bernhard Schölkopf et al.

While the channel capacity reflects a theoretical upper bound on the achievable information transmission rate in the limit of infinitely many bits, it does not characterise the information transfer of a given encoding routine with finitely many bits. In this note, we characterise the quality of a code (i. e. a given encoding routine) by an upper bound on the expected minimum error probability that can be achieved when using this code. We show that for equientropic channels this upper bound is minimal for codes with maximal marginal entropy. As an instructive example we show for the additive white Gaussian noise (AWGN) channel that random coding---also a capacity achieving code---indeed maximises the marginal entropy in the limit of infinite messages.

MEMay 2, 2016
Recovery of non-linear cause-effect relationships from linearly mixed neuroimaging data

Sebastian Weichwald, Arthur Gretton, Bernhard Schölkopf et al.

Causal inference concerns the identification of cause-effect relationships between variables. However, often only linear combinations of variables constitute meaningful causal variables. For example, recovering the signal of a cortical source from electroencephalography requires a well-tuned combination of signals recorded at multiple electrodes. We recently introduced the MERLiN (Mixture Effect Recovery in Linear Networks) algorithm that is able to recover, from an observed linear mixture, a causal variable that is a linear effect of another given variable. Here we relax the assumption of this cause-effect relationship being linear and present an extended algorithm that can pick up non-linear cause-effect relationships. Thus, the main contribution is an algorithm (and ready to use code) that has broader applicability and allows for a richer model class. Furthermore, a comparative analysis indicates that the assumption of linear cause-effect relationships is not restrictive in analysing electroencephalographic data.

MSMar 10, 2016
Pymanopt: A Python Toolbox for Optimization on Manifolds using Automatic Differentiation

James Townsend, Niklas Koep, Sebastian Weichwald

Optimization on manifolds is a class of methods for optimization of an objective function, subject to constraints which are smooth, in the sense that the set of points which satisfy the constraints admits the structure of a differentiable manifold. While many optimization problems are of the described form, technicalities of differential geometry and the laborious calculation of derivatives pose a significant barrier for experimenting with these methods. We introduce Pymanopt (available at https://pymanopt.github.io), a toolbox for optimization on manifolds, implemented in Python, that---similarly to the Manopt Matlab toolbox---implements several manifold geometries and optimization algorithms. Moreover, we lower the barriers to users further by using automated differentiation for calculating derivative information, saving users time and saving them from potential calculation and implementation errors.

MLDec 15, 2015
Causal and anti-causal learning in pattern recognition for neuroimaging

Sebastian Weichwald, Bernhard Schölkopf, Tonio Ball et al.

Pattern recognition in neuroimaging distinguishes between two types of models: encoding- and decoding models. This distinction is based on the insight that brain state features, that are found to be relevant in an experimental paradigm, carry a different meaning in encoding- than in decoding models. In this paper, we argue that this distinction is not sufficient: Relevant features in encoding- and decoding models carry a different meaning depending on whether they represent causal- or anti-causal relations. We provide a theoretical justification for this argument and conclude that causal inference is essential for interpretation in neuroimaging.

MLDec 14, 2015
Decoding index finger position from EEG using random forests

Sebastian Weichwald, Timm Meyer, Bernhard Schölkopf et al.

While invasively recorded brain activity is known to provide detailed information on motor commands, it is an open question at what level of detail information about positions of body parts can be decoded from non-invasively acquired signals. In this work it is shown that index finger positions can be differentiated from non-invasive electroencephalographic (EEG) recordings in healthy human subjects. Using a leave-one-subject-out cross-validation procedure, a random forest distinguished different index finger positions on a numerical keyboard above chance-level accuracy. Among the different spectral features investigated, high $β$-power (20-30 Hz) over contralateral sensorimotor cortex carried most information about finger position. Thus, these findings indicate that finger position is in principle decodable from non-invasive features of brain activity that generalize across individuals.

MLNov 15, 2015
Causal interpretation rules for encoding and decoding models in neuroimaging

Sebastian Weichwald, Timm Meyer, Ozan Özdenizci et al.

Causal terminology is often introduced in the interpretation of encoding and decoding models trained on neuroimaging data. In this article, we investigate which causal statements are warranted and which ones are not supported by empirical evidence. We argue that the distinction between encoding and decoding models is not sufficient for this purpose: relevant features in encoding and decoding models carry a different meaning in stimulus- and in response-based experimental paradigms. We show that only encoding models in the stimulus-based setting support unambiguous causal interpretations. By combining encoding and decoding models trained on the same data, however, we obtain insights into causal relations beyond those that are implied by each individual model type. We illustrate the empirical relevance of our theoretical findings on EEG data recorded during a visuo-motor learning task.