Mathieu Chevalley

LG
h-index62
8papers
129citations
Novelty51%
AI Score37

8 Papers

LGOct 31, 2022
CausalBench: A Large-scale Benchmark for Network Inference from Single-cell Perturbation Data

Mathieu Chevalley, Yusuf Roohani, Arash Mehrjou et al.

Causal inference is a vital aspect of multiple scientific disciplines and is routinely applied to high-impact applications such as medicine. However, evaluating the performance of causal inference methods in real-world environments is challenging due to the need for observations under both interventional and control conditions. Traditional evaluations conducted on synthetic datasets do not reflect the performance in real-world systems. To address this, we introduce CausalBench, a benchmark suite for evaluating network inference methods on real-world interventional data from large-scale single-cell perturbation experiments. CausalBench incorporates biologically-motivated performance metrics, including new distribution-based interventional metrics. A systematic evaluation of state-of-the-art causal inference methods using our CausalBench suite highlights how poor scalability of current methods limits performance. Moreover, methods that use interventional information do not outperform those that only use observational data, contrary to what is observed on synthetic benchmarks. Thus, CausalBench opens new avenues in causal network inference research and provides a principled and reliable way to track progress in leveraging real-world interventional data.

LGJun 23, 2022
Invariant Causal Mechanisms through Distribution Matching

Mathieu Chevalley, Charlotte Bunne, Andreas Krause et al.

Learning representations that capture the underlying data generating process is a key problem for data efficient and robust use of neural networks. One key property for robustness which the learned representation should capture and which recently received a lot of attention is described by the notion of invariance. In this work we provide a causal perspective and new algorithm for learning invariant representations. Empirically we show that this algorithm works well on a diverse set of tasks and in particular we observe state-of-the-art performance on domain generalization, where we are able to significantly boost the score of existing models.

LGAug 29, 2023
The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data

Mathieu Chevalley, Jacob Sackett-Sanders, Yusuf Roohani et al.

In drug discovery, mapping interactions between genes within cellular systems is a crucial early step. Such maps are not only foundational for understanding the molecular mechanisms underlying disease biology but also pivotal for formulating hypotheses about potential targets for new medicines. Recognizing the need to elevate the construction of these gene-gene interaction networks, especially from large-scale, real-world datasets of perturbed single cells, the CausalBench Challenge was initiated. This challenge aimed to inspire the machine learning community to enhance state-of-the-art methods, emphasizing better utilization of expansive genetic perturbation data. Using the framework provided by the CausalBench benchmark, participants were tasked with refining the current methodologies or proposing new ones. This report provides an analysis and summary of the methods submitted during the challenge to give a partial image of the state of the art at the time of the challenge. Notably, the winning solutions significantly improved performance compared to previous baselines, establishing a new state of the art for this critical task in biology and medicine.

LGNov 4, 2025
Theoretical Guarantees for Causal Discovery on Large Random Graphs

Mathieu Chevalley, Arash Mehrjou, Patrick Schwab

We investigate theoretical guarantees for the false-negative rate (FNR) -- the fraction of true causal edges whose orientation is not recovered, under single-variable random interventions and an $ε$-interventional faithfulness assumption that accommodates latent confounding. For sparse Erdős--Rényi directed acyclic graphs, where the edge probability scales as $p_e = Θ(1/d)$, we show that the FNR concentrates around its mean at rate $O(\frac{\log d}{\sqrt d})$, implying that large deviations above the expected error become exponentially unlikely as dimensionality increases. This concentration ensures that derived upper bounds hold with high probability in large-scale settings. Extending the analysis to generalized Barabási--Albert graphs reveals an even stronger phenomenon: when the degree exponent satisfies $γ> 3$, the deviation width scales as $O(d^{β- \frac{1}{2}})$ with $β= 1/(γ- 1) < \frac{1}{2}$, and hence vanishes in the limit. This demonstrates that realistic scale-free topologies intrinsically regularize causal discovery, reducing variability in orientation error. These finite-dimension results provide the first dimension-adaptive, faithfulness-robust guarantees for causal structure recovery, and challenge the intuition that high dimensionality and network heterogeneity necessarily hinder accurate discovery. Our simulation results corroborate these theoretical predictions, showing that the FNR indeed concentrates and often vanishes in practice as dimensionality grows.

LGMar 30, 2025
In-silico biological discovery with large perturbation models

Djordje Miladinovic, Tobias Höppe, Mathieu Chevalley et al.

Data generated in perturbation experiments link perturbations to the changes they elicit and therefore contain information relevant to numerous biological discovery tasks -- from understanding the relationships between biological entities to developing therapeutics. However, these data encompass diverse perturbations and readouts, and the complex dependence of experimental outcomes on their biological context makes it challenging to integrate insights across experiments. Here, we present the Large Perturbation Model (LPM), a deep-learning model that integrates multiple, heterogeneous perturbation experiments by representing perturbation, readout, and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments, identifying shared molecular mechanisms of action between chemical and genetic perturbations, and facilitating the inference of gene-gene interaction networks.

GNJan 13, 2025
Multi-megabase scale genome interpretation with genetic language models

Frederik Träuble, Lachlan Stuart, Andreas Georgiou et al.

Understanding how molecular changes caused by genetic variation drive disease risk is crucial for deciphering disease mechanisms. However, interpreting genome sequences is challenging because of the vast size of the human genome, and because its consequences manifest across a wide range of cells, tissues and scales -- spanning from molecular to whole organism level. Here, we present Phenformer, a multi-scale genetic language model that learns to generate mechanistic hypotheses as to how differences in genome sequence lead to disease-relevant changes in expression across cell types and tissues directly from DNA sequences of up to 88 million base pairs. Using whole genome sequencing data from more than 150 000 individuals, we show that Phenformer generates mechanistic hypotheses about disease-relevant cell and tissue types that match literature better than existing state-of-the-art methods, while using only sequence data. Furthermore, disease risk predictors enriched by Phenformer show improved prediction performance and generalisation to diverse populations. Accurate multi-megabase scale interpretation of whole genomes without additional experimental data enables both a deeper understanding of molecular mechanisms involved in disease and improved disease risk prediction at the level of individuals.

LGOct 11, 2024
Efficient Differentiable Discovery of Causal Order

Mathieu Chevalley, Arash Mehrjou, Patrick Schwab

In the algorithm Intersort, Chevalley et al. (2024) proposed a score-based method to discover the causal order of variables in a Directed Acyclic Graph (DAG) model, leveraging interventional data to outperform existing methods. However, as a score-based method over the permutahedron, Intersort is computationally expensive and non-differentiable, limiting its ability to be utilised in problems involving large-scale datasets, such as those in genomics and climate models, or to be integrated into end-to-end gradient-based learning frameworks. We address this limitation by reformulating Intersort using differentiable sorting and ranking techniques. Our approach enables scalable and differentiable optimization of causal orderings, allowing the continuous score function to be incorporated as a regularizer in downstream tasks. Empirical results demonstrate that causal discovery algorithms benefit significantly from regularizing on the causal order, underscoring the effectiveness of our method. Our work opens the door to efficiently incorporating regularization for causal order into the training of differentiable models and thereby addresses a long-standing limitation of purely associational supervised learning.

CYNov 13, 2019
By the user, for the user: A user-centric approach to quantifying the privacy of websites

Matius Chairani, Mathieu Chevalley, Abderrahmane Lazraq et al.

Third-party tracking is common on almost all commercially operated websites. Prior work has studied in detail the extent of third-party tracking on the web, detection of third-party trackers, and defending against third-party tracking. Existing research and tools have also attempted to inform web users of trackers and the extent of their privacy violations. However, existing tools do not take into account users' perceptions of and understanding of the extent of trackers on the web. Taking these factors into account is important for the usability of such tools so that users can be aware and protect themselves to a reasonable and necessary extent that aligns with their overall comfort with trackers. In this paper, we elicit user perceptions and preferences about different trackers on various websites through an online survey of 43 users. We use this data to bootstrap a privacy scoring system. This scoring system weights the usage of trackers and the dispersion of user data within a page to third parties, with the type of website being visited. Our work presents a proof-of-concept methodology and tool to calculate a user-centric privacy score with preliminary bootstrap user data. We conclude with concrete future directions.