CYMay 26, 2022
Flexible Group Fairness Metrics for Survival AnalysisRaphael Sonabend, Florian Pfisterer, Alan Mishler et al.
Algorithmic fairness is an increasingly important field concerned with detecting and mitigating biases in machine learning models. There has been a wealth of literature for algorithmic fairness in regression and classification however there has been little exploration of the field for survival analysis. Survival analysis is the prediction task in which one attempts to predict the probability of an event occurring over time. Survival predictions are particularly important in sensitive settings such as when utilising machine learning for diagnosis and prognosis of patients. In this paper we explore how to utilise existing survival metrics to measure bias with group fairness metrics. We explore this in an empirical experiment with 29 survival datasets and 8 measures. We find that measures of discrimination are able to capture bias well whereas there is less clarity with measures of calibration and scoring rules. We suggest further areas for research including prediction-based fairness metrics for distribution predictions.
LGFeb 10Code
Linear-LLM-SCM: Benchmarking LLMs for Coefficient Elicitation in Linear-Gaussian Causal ModelsKanta Yamaoka, Sumantrak Mukherjee, Thomas Gärtner et al.
Large language models (LLMs) have shown potential in identifying qualitative causal relations, but their ability to perform quantitative causal reasoning -- estimating effect sizes that parametrize functional relationships -- remains underexplored in continuous domains. We introduce Linear-LLM-SCM, a plug-and-play benchmarking framework for evaluating LLMs on linear Gaussian structural causal model (SCM) parametrization when the DAG is given. The framework decomposes a DAG into local parent-child sets and prompts an LLM to produce a regression-style structural equation per node, which is aggregated and compared against available ground-truth parameters. Our experiments show several challenges in such benchmarking tasks, namely, strong stochasticity in the results in some of the models and susceptibility to DAG misspecification via spurious edges in the continuous domains. Across models, we observe substantial variability in coefficient estimates for some settings and sensitivity to structural and semantic perturbations, highlighting current limitations of LLMs as quantitative causal parameterizers. We also open-sourced the benchmarking framework so that researchers can utilize their DAGs and any off-the-shelf LLMs plug-and-play for evaluation in their domains effortlessly.
IRFeb 12, 2024
Had enough of experts? Quantitative knowledge retrieval from large language modelsDavid Selby, Kai Spriestersbach, Yuichiro Iwashita et al.
Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences, however their utility for quantitative information retrieval is less well understood. Here we explore the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid two data analysis tasks: elicitation of prior distributions for Bayesian models and imputation of missing data. We introduce a framework that leverages LLMs to enhance Bayesian workflows by eliciting expert-like prior knowledge and imputing missing data. Tested on diverse datasets, this approach can improve predictive accuracy and reduce data requirements, offering significant potential in healthcare, environmental science and engineering applications. We discuss the implications and challenges of treating LLMs as 'experts'.
LGFeb 13, 2025
Neural Spatiotemporal Point Processes: Trends and ChallengesSumantrak Mukherjee, Mouad Elhamdi, George Mohler et al.
Spatiotemporal point processes (STPPs) are probabilistic models for events occurring in continuous space and time. Real-world event data often exhibit intricate dependencies and heterogeneous dynamics. By incorporating modern deep learning techniques, STPPs can model these complexities more effectively than traditional approaches. Consequently, the fusion of neural methods with STPPs has become an active and rapidly evolving research area. In this review, we categorize existing approaches, unify key design choices, and explain the challenges of working with this data modality. We further highlight emerging trends and diverse application domains. Finally, we identify open challenges and gaps in the literature.
LGDec 14, 2025
Co-Exploration and Co-Exploitation via Shared Structure in Multi-Task BanditsSumantrak Mukherjee, Serafima Lebedeva, Valentin Margraf et al.
We propose a novel Bayesian framework for efficient exploration in contextual multi-task multi-armed bandit settings, where the context is only observed partially and dependencies between reward distributions are induced by latent context variables. In order to exploit these structural dependencies, our approach integrates observations across all tasks and learns a global joint distribution, while still allowing personalised inference for new tasks. In this regard, we identify two key sources of epistemic uncertainty, namely structural uncertainty in the latent reward dependencies across arms and tasks, and user-specific uncertainty due to incomplete context and limited interaction history. To put our method into practice, we represent the joint distribution over tasks and rewards using a particle-based approximation of a log-density Gaussian process. This representation enables flexible, data-driven discovery of both inter-arm and inter-task dependencies without prior assumptions on the latent variables. Empirically, we demonstrate that our method outperforms baselines such as hierarchical model bandits, especially in settings with model misspecification or complex latent heterogeneity.
LGMar 31, 2025
When Counterfactual Reasoning Fails: Chaos and Real-World ComplexityYahya Aalaila, Gerrit Großmann, Sumantrak Mukherjee et al.
Counterfactual reasoning, a cornerstone of human cognition and decision-making, is often seen as the 'holy grail' of causal learning, with applications ranging from interpreting machine learning models to promoting algorithmic fairness. While counterfactual reasoning has been extensively studied in contexts where the underlying causal model is well-defined, real-world causal modeling is often hindered by model and parameter uncertainty, observational noise, and chaotic behavior. The reliability of counterfactual analysis in such settings remains largely unexplored. In this work, we investigate the limitations of counterfactual reasoning within the framework of Structural Causal Models. Specifically, we empirically investigate \emph{counterfactual sequence estimation} and highlight cases where it becomes increasingly unreliable. We find that realistic assumptions, such as low degrees of model uncertainty or chaotic dynamics, can result in counterintuitive outcomes, including dramatic deviations between predicted and true counterfactual trajectories. This work urges caution when applying counterfactual reasoning in settings characterized by chaos and uncertainty. Furthermore, it raises the question of whether certain systems may pose fundamental limitations on the ability to answer counterfactual questions about their behavior.
LGNov 5, 2024
Graph Agnostic Causal Bayesian OptimisationSumantrak Mukherjee, Mengyan Zhang, Seth Flaxman et al.
We study the problem of globally optimising a target variable of an unknown causal graph on which a sequence of soft or hard interventions can be performed. The problem of optimising the target variable associated with a causal graph is formalised as Causal Bayesian Optimisation (CBO). We study the CBO problem under the cumulative regret objective with unknown causal graphs for two settings, namely structural causal models with hard interventions and function networks with soft interventions. We propose Graph Agnostic Causal Bayesian Optimisation (GACBO), an algorithm that actively discovers the causal structure that contributes to achieving optimal rewards. GACBO seeks to balance exploiting the actions that give the best rewards against exploring the causal structures and functions. To the best of our knowledge, our work is the first to study causal Bayesian optimization with cumulative regret objectives in scenarios where the graph is unknown or partially known. We show our proposed algorithm outperforms baselines in simulated experiments and real-world applications.
LGJan 16, 2024
X Hacking: The Threat of Misguided AutoMLRahul Sharma, Sergey Redyuk, Sumantrak Mukherjee et al.
Explainable AI (XAI) and interpretable machine learning methods help to build trust in model predictions and derived insights, yet also present a perverse incentive for analysts to manipulate XAI metrics to support pre-specified conclusions. This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as SHAP values. We show how easily an automated machine learning pipeline can be adapted to exploit model multiplicity at scale: searching a Rashomon set of 'defensible' models with similar predictive performance to find a desired explanation. We formulate the trade-off between explanation and accuracy as a multi-objective optimisation problem, and illustrate empirically on familiar real-world datasets that, on average, Bayesian optimisation accelerates X-hacking 3-fold for features susceptible to it, versus random sampling. We show the vulnerability of a dataset to X-hacking can be determined by information redundancy among features. Finally, we suggest possible methods for detection and prevention, and discuss ethical implications for the credibility and reproducibility of XAI.