LGNov 7, 2023
ClimateSet: A Large-Scale Climate Model Dataset for Machine LearningJulia Kaltenborn, Charlotte E. E. Lange, Venkatesh Ramesh et al. · mila
Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Many of those tasks have been addressed on datasets created with single climate models. However, both the climate science and ML communities have suggested that to address those tasks at scale, we need large, consistent, and ML-ready climate model datasets. Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives. In addition, we provide a modular dataset pipeline for retrieving and preprocessing additional climate models and scenarios. We showcase the potential of our dataset by using it as a benchmark for ML-based climate model emulation. We gain new insights about the performance and generalization capabilities of the different ML models by analyzing their performance across different climate models. Furthermore, the dataset can be used to train an ML emulator on several climate models instead of just one. Such a "super emulator" can quickly project new climate change scenarios, complementing existing scenarios already provided to policymakers. We believe ClimateSet will create the basis needed for the ML community to tackle climate-related tasks at scale.
IROct 7, 2022
CLEAR: Causal Explanations from Attention in Neural RecommendersShami Nisimov, Raanan Y. Rohekar, Yaniv Gurwicz et al.
We present CLEAR, a method for learning session-specific causal graphs, in the possible presence of latent confounders, from attention in pre-trained attention-based recommenders. These causal graphs describe user behavior, within the context captured by attention, and can provide a counterfactual explanation for a recommendation. In essence, these causal graphs allow answering "why" questions uniquely for any specific session. Using empirical evaluations we show that, compared to naively using attention weights to explain input-output relations, counterfactual explanations found by CLEAR are shorter and an alternative recommendation is ranked higher in the original top-k recommendations.
CVAug 28, 2024Code
ClimDetect: A Benchmark Dataset for Climate Change Detection and AttributionSungduk Yu, Brian L. White, Anahita Bhiwandiwalla et al.
Detecting and attributing temperature increases driven by climate change is crucial for understanding global warming and informing adaptation strategies. However, distinguishing human-induced climate signals from natural variability remains challenging for traditional detection and attribution (D&A) methods, which rely on identifying specific "fingerprints" -- spatial patterns expected to emerge from external forcings such as greenhouse gas emissions. Deep learning offers promise in discerning these complex patterns within expansive spatial datasets, yet the lack of standardized protocols has hindered consistent comparisons across studies. To address this gap, we introduce ClimDetect, a standardized dataset comprising 1.17M daily climate snapshots paired with target climate change indicator variables. The dataset is curated from both CMIP6 climate model simulations and real-world observation-assimilated reanalysis datasets (ERA5, JRA-3Q, and MERRA-2), and is designed to enhance model accuracy in detecting climate change signals. ClimDetect integrates various input and target variables used in previous research, ensuring comparability and consistency across studies. We also explore the application of vision transformers (ViT) to climate data -- a novel approach that, to our knowledge, has not been attempted before for climate change detection tasks. Our open-access data serve as a benchmark for advancing climate science by enabling end-to-end model development and evaluation. ClimDetect is publicly accessible via Hugging Face dataset repository at: https://huggingface.co/datasets/ClimDetect/ClimDetect.
AIOct 31, 2023
Causal Interpretation of Self-Attention in Pre-Trained TransformersRaanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov
We propose a causal interpretation of self-attention in the Transformer neural network architecture. We interpret self-attention as a mechanism that estimates a structural equation model for a given input sequence of symbols (tokens). The structural equation model can be interpreted, in turn, as a causal structure over the input symbols under the specific context of the input sequence. Importantly, this interpretation remains valid in the presence of latent confounders. Following this interpretation, we estimate conditional independence relations between input symbols by calculating partial correlations between their corresponding representations in the deepest attention layer. This enables learning the causal structure over an input sequence using existing constraint-based algorithms. In this sense, existing pre-trained Transformers can be utilized for zero-shot causal-discovery. We demonstrate this method by providing causal explanations for the outcomes of Transformers in two tasks: sentiment classification (NLP) and recommendation.
AIJun 1, 2023
From Temporal to Contemporaneous Iterative Causal Discovery in the Presence of Latent ConfoundersRaanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz et al.
We present a constraint-based algorithm for learning causal structures from observational time-series data, in the presence of latent confounders. We assume a discrete-time, stationary structural vector autoregressive process, with both temporal and contemporaneous causal relations. One may ask if temporal and contemporaneous relations should be treated differently. The presented algorithm gradually refines a causal graph by learning long-term temporal relations before short-term ones, where contemporaneous relations are learned last. This ordering of causal relations to be learnt leads to a reduction in the required number of statistical tests. We validate this reduction empirically and demonstrate that it leads to higher accuracy for synthetic data and more plausible causal graphs for real-world data compared to state-of-the-art algorithms.
LGNov 7, 2021Code
Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection BiasRaanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz et al.
We present a sound and complete algorithm, called iterative causal discovery (ICD), for recovering causal graphs in the presence of latent confounders and selection bias. ICD relies on the causal Markov and faithfulness assumptions and recovers the equivalence class of the underlying causal graph. It starts with a complete graph, and consists of a single iterative stage that gradually refines this graph by identifying conditional independence (CI) between connected nodes. Independence and causal relations entailed after any iteration are correct, rendering ICD anytime. Essentially, we tie the size of the CI conditioning set to its distance on the graph from the tested nodes, and increase this value in the successive iteration. Thus, each iteration refines a graph that was recovered by previous iterations having smaller conditioning sets -- a higher statistical power -- which contributes to stability. We demonstrate empirically that ICD requires significantly fewer CI tests and learns more accurate causal graphs compared to FCI, FCI+, and RFCI algorithms (code is available at https://github.com/IntelLabs/causality-lab).
CVApr 3, 2024
LVLM-Interpret: An Interpretability Tool for Large Vision-Language ModelsGabriela Ben Melech Stan, Estelle Aflalo, Raanan Yehezkel Rohekar et al.
In the rapidly evolving landscape of artificial intelligence, multi-modal large language models are emerging as a significant area of interest. These models, which combine various forms of data input, are becoming increasingly popular. However, understanding their internal mechanisms remains a complex task. Numerous advancements have been made in the field of explainability tools and mechanisms, yet there is still much to explore. In this work, we present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models. Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer, and assess the efficacy of the language model in grounding its output in the image. With our application, a user can systematically investigate the model and uncover system limitations, paving the way for enhancements in system capabilities. Finally, we present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
LGDec 5, 2023
Towards Causal Representations of Climate Model DataJulien Boussard, Chandni Nagda, Julia Kaltenborn et al.
Climate models, such as Earth system models (ESMs), are crucial for simulating future climate change based on projected Shared Socioeconomic Pathways (SSP) greenhouse gas emissions scenarios. While ESMs are sophisticated and invaluable, machine learning-based emulators trained on existing simulation data can project additional climate scenarios much faster and are computationally efficient. However, they often lack generalizability and interpretability. This work delves into the potential of causal representation learning, specifically the \emph{Causal Discovery with Single-parent Decoding} (CDSD) method, which could render climate model emulation efficient \textit{and} interpretable. We evaluate CDSD on multiple climate datasets, focusing on emissions, temperature, and precipitation. Our findings shed light on the challenges, limitations, and promise of using CDSD as a stepping stone towards more interpretable and robust climate model emulation.
LGJun 11, 2025
Causal Climate Emulation with Bayesian FilteringSebastian Hickman, Ilija Trajkovic, Julia Kaltenborn et al.
Traditional models of climate change use complex systems of coupled equations to simulate physical processes across the Earth system. These simulations are highly computationally expensive, limiting our predictions of climate change and analyses of its causes and effects. Machine learning has the potential to quickly emulate data from climate models, but current approaches are not able to incorporate physically-based causal relationships. Here, we develop an interpretable climate model emulator based on causal representation learning. We derive a novel approach including a Bayesian filter for stable long-term autoregressive emulation. We demonstrate that our emulator learns accurate climate dynamics, and we show the importance of each one of its components on a realistic synthetic dataset and data from two widely deployed climate models.
AIDec 10, 2024
A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled EnvironmentRaanan Y. Rohekar, Yaniv Gurwicz, Sungduk Yu et al.
Are generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learning a world model from which sequences are generated one token at a time? We address this question by deriving a causal interpretation of the attention mechanism in GPT and presenting a causal world model that arises from this interpretation. Furthermore, we propose that GPT models, at inference time, can be utilized for zero-shot causal structure learning for input sequences, and introduce a corresponding confidence score. Empirical tests were conducted in controlled environments using the setups of the Othello and Chess strategy games. A GPT, pre-trained on real-world games played with the intention of winning, was tested on out-of-distribution synthetic data consisting of sequences of random legal moves. We find that the GPT model is likely to generate legal next moves for out-of-distribution sequences for which a causal structure is encoded in the attention mechanism with high confidence. In cases where it generates illegal moves, it also fails to capture a causal structure.
MLJul 11, 2021
Improving Efficiency and Accuracy of Causal Discovery Using a Hierarchical WrapperShami Nisimov, Yaniv Gurwicz, Raanan Y. Rohekar et al.
Causal discovery from observational data is an important tool in many branches of science. Under certain assumptions it allows scientists to explain phenomena, predict, and make decisions. In the large sample limit, sound and complete causal discovery algorithms have been previously introduced, where a directed acyclic graph (DAG), or its equivalence class, representing causal relations is searched. However, in real-world cases, only finite training data is available, which limits the power of statistical tests used by these algorithms, leading to errors in the inferred causal model. This is commonly addressed by devising a strategy for using as few as possible statistical tests. In this paper, we introduce such a strategy in the form of a recursive wrapper for existing constraint-based causal discovery algorithms, which preserves soundness and completeness. It recursively clusters the observed variables using the normalized min-cut criterion from the outset, and uses a baseline causal discovery algorithm during backtracking for learning local sub-graphs. It then combines them and ensures completeness. By an ablation study, using synthetic data, and by common real-world benchmarks, we demonstrate that our approach requires significantly fewer statistical tests, learns more accurate graphs, and requires shorter run-times than the baseline algorithm.
AIDec 14, 2020
A Single Iterative Step for Anytime Causal DiscoveryRaanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov et al.
We present a sound and complete algorithm for recovering causal graphs from observed, non-interventional data, in the possible presence of latent confounders and selection bias. We rely on the causal Markov and faithfulness assumptions and recover the equivalence class of the underlying causal graph by performing a series of conditional independence (CI) tests between observed variables. We propose a single step that is applied iteratively, such that the independence and causal relations entailed from the resulting graph, after any iteration, is correct and becomes more informative with successive iteration. Essentially, we tie the size of the CI condition set to its distance from the tested nodes on the resulting graph. Each iteration refines the skeleton and orientation by performing CI tests having condition sets that are larger than in the preceding iteration. In an iteration, condition sets of CI tests are constructed from nodes that are within a specified search distance, and the sizes of these condition sets is equal to this search distance. The algorithm then iteratively increases the search distance along with the condition set sizes. Thus, each iteration refines a graph, that was recovered by previous iterations having smaller condition sets -- having a higher statistical power. We demonstrate that our algorithm requires significantly fewer CI tests and smaller condition sets compared to the FCI algorithm. This is evident for both recovering the true underlying graph using a perfect CI oracle, and accurately estimating the graph using limited observed data.
MLMay 30, 2019
Modeling Uncertainty by Learning a Hierarchy of Deep Neural ConnectionsRaanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov et al.
Modeling uncertainty in deep neural networks, despite recent important advances, is still an open problem. Bayesian neural networks are a powerful solution, where the prior over network weights is a design choice, often a normal distribution or other distribution encouraging sparsity. However, this prior is agnostic to the generative process of the input data, which might lead to unwarranted generalization for out-of-distribution tested data. We suggest the presence of a confounder for the relation between the input data and the discriminative function given the target label. We propose an approach for modeling this confounder by sharing neural connectivity patterns between the generative and discriminative networks. This approach leads to a new deep architecture, where networks are sampled from the posterior of local causal structures, and coupled into a compact hierarchy. We demonstrate that sampling networks from this hierarchy, proportionally to their posterior, is efficient and enables estimating various types of uncertainties. Empirical evaluations of our method demonstrate significant improvement compared to state-of-the-art calibration and out-of-distribution detection methods.
MLSep 13, 2018
Bayesian Structure Learning by Recursive BootstrapRaanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov et al.
We address the problem of Bayesian structure learning for domains with hundreds of variables by employing non-parametric bootstrap, recursively. We propose a method that covers both model averaging and model selection in the same framework. The proposed method deals with the main weakness of constraint-based learning---sensitivity to errors in the independence tests---by a novel way of combining bootstrap with constraint-based learning. Essentially, we provide an algorithm for learning a tree, in which each node represents a scored CPDAG for a subset of variables and the level of the node corresponds to the maximal order of conditional independencies that are encoded in the graph. As higher order independencies are tested in deeper recursive calls, they benefit from more bootstrap samples, and therefore more resistant to the curse-of-dimensionality. Moreover, the re-use of stable low order independencies allows greater computational efficiency. We also provide an algorithm for sampling CPDAGs efficiently from their posterior given the learned tree. We empirically demonstrate that the proposed algorithm scales well to hundreds of variables, and learns better MAP models and more reliable causal relationships between variables, than other state-of-the-art-methods.
MLJun 24, 2018
Constructing Deep Neural Networks by Bayesian Network Structure LearningRaanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz et al.
We introduce a principled approach for unsupervised structure learning of deep neural networks. We propose a new interpretation for depth and inter-layer connectivity where conditional independencies in the input distribution are encoded hierarchically in the network structure. Thus, the depth of the network is determined inherently. The proposed method casts the problem of neural network structure learning as a problem of Bayesian network structure learning. Then, instead of directly learning the discriminative structure, it learns a generative graph, constructs its stochastic inverse, and then constructs a discriminative graph. We prove that conditional-dependency relations among the latent variables in the generative graph are preserved in the class-conditional discriminative graph. We demonstrate on image classification benchmarks that the deepest layers (convolutional and dense) of common networks can be replaced by significantly smaller learned structures, while maintaining classification accuracy---state-of-the-art on tested benchmarks. Our structure learning algorithm requires a small computational cost and runs efficiently on a standard desktop CPU.