Raanan Y. Rohekar

AI
h-index9
11papers
176citations
Novelty49%
AI Score33

11 Papers

IROct 7, 2022
CLEAR: Causal Explanations from Attention in Neural Recommenders

Shami Nisimov, Raanan Y. Rohekar, Yaniv Gurwicz et al.

We present CLEAR, a method for learning session-specific causal graphs, in the possible presence of latent confounders, from attention in pre-trained attention-based recommenders. These causal graphs describe user behavior, within the context captured by attention, and can provide a counterfactual explanation for a recommendation. In essence, these causal graphs allow answering "why" questions uniquely for any specific session. Using empirical evaluations we show that, compared to naively using attention weights to explain input-output relations, counterfactual explanations found by CLEAR are shorter and an alternative recommendation is ranked higher in the original top-k recommendations.

CVAug 28, 2024Code
ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution

Sungduk Yu, Brian L. White, Anahita Bhiwandiwalla et al.

Detecting and attributing temperature increases driven by climate change is crucial for understanding global warming and informing adaptation strategies. However, distinguishing human-induced climate signals from natural variability remains challenging for traditional detection and attribution (D&A) methods, which rely on identifying specific "fingerprints" -- spatial patterns expected to emerge from external forcings such as greenhouse gas emissions. Deep learning offers promise in discerning these complex patterns within expansive spatial datasets, yet the lack of standardized protocols has hindered consistent comparisons across studies. To address this gap, we introduce ClimDetect, a standardized dataset comprising 1.17M daily climate snapshots paired with target climate change indicator variables. The dataset is curated from both CMIP6 climate model simulations and real-world observation-assimilated reanalysis datasets (ERA5, JRA-3Q, and MERRA-2), and is designed to enhance model accuracy in detecting climate change signals. ClimDetect integrates various input and target variables used in previous research, ensuring comparability and consistency across studies. We also explore the application of vision transformers (ViT) to climate data -- a novel approach that, to our knowledge, has not been attempted before for climate change detection tasks. Our open-access data serve as a benchmark for advancing climate science by enabling end-to-end model development and evaluation. ClimDetect is publicly accessible via Hugging Face dataset repository at: https://huggingface.co/datasets/ClimDetect/ClimDetect.

AIOct 31, 2023
Causal Interpretation of Self-Attention in Pre-Trained Transformers

Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov

We propose a causal interpretation of self-attention in the Transformer neural network architecture. We interpret self-attention as a mechanism that estimates a structural equation model for a given input sequence of symbols (tokens). The structural equation model can be interpreted, in turn, as a causal structure over the input symbols under the specific context of the input sequence. Importantly, this interpretation remains valid in the presence of latent confounders. Following this interpretation, we estimate conditional independence relations between input symbols by calculating partial correlations between their corresponding representations in the deepest attention layer. This enables learning the causal structure over an input sequence using existing constraint-based algorithms. In this sense, existing pre-trained Transformers can be utilized for zero-shot causal-discovery. We demonstrate this method by providing causal explanations for the outcomes of Transformers in two tasks: sentiment classification (NLP) and recommendation.

AIJun 1, 2023
From Temporal to Contemporaneous Iterative Causal Discovery in the Presence of Latent Confounders

Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz et al.

We present a constraint-based algorithm for learning causal structures from observational time-series data, in the presence of latent confounders. We assume a discrete-time, stationary structural vector autoregressive process, with both temporal and contemporaneous causal relations. One may ask if temporal and contemporaneous relations should be treated differently. The presented algorithm gradually refines a causal graph by learning long-term temporal relations before short-term ones, where contemporaneous relations are learned last. This ordering of causal relations to be learnt leads to a reduction in the required number of statistical tests. We validate this reduction empirically and demonstrate that it leads to higher accuracy for synthetic data and more plausible causal graphs for real-world data compared to state-of-the-art algorithms.

LGNov 7, 2021Code
Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias

Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz et al.

We present a sound and complete algorithm, called iterative causal discovery (ICD), for recovering causal graphs in the presence of latent confounders and selection bias. ICD relies on the causal Markov and faithfulness assumptions and recovers the equivalence class of the underlying causal graph. It starts with a complete graph, and consists of a single iterative stage that gradually refines this graph by identifying conditional independence (CI) between connected nodes. Independence and causal relations entailed after any iteration are correct, rendering ICD anytime. Essentially, we tie the size of the CI conditioning set to its distance on the graph from the tested nodes, and increase this value in the successive iteration. Thus, each iteration refines a graph that was recovered by previous iterations having smaller conditioning sets -- a higher statistical power -- which contributes to stability. We demonstrate empirically that ICD requires significantly fewer CI tests and learns more accurate causal graphs compared to FCI, FCI+, and RFCI algorithms (code is available at https://github.com/IntelLabs/causality-lab).

AIDec 10, 2024
A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment

Raanan Y. Rohekar, Yaniv Gurwicz, Sungduk Yu et al.

Are generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learning a world model from which sequences are generated one token at a time? We address this question by deriving a causal interpretation of the attention mechanism in GPT and presenting a causal world model that arises from this interpretation. Furthermore, we propose that GPT models, at inference time, can be utilized for zero-shot causal structure learning for input sequences, and introduce a corresponding confidence score. Empirical tests were conducted in controlled environments using the setups of the Othello and Chess strategy games. A GPT, pre-trained on real-world games played with the intention of winning, was tested on out-of-distribution synthetic data consisting of sequences of random legal moves. We find that the GPT model is likely to generate legal next moves for out-of-distribution sequences for which a causal structure is encoded in the attention mechanism with high confidence. In cases where it generates illegal moves, it also fails to capture a causal structure.

MLJul 11, 2021
Improving Efficiency and Accuracy of Causal Discovery Using a Hierarchical Wrapper

Shami Nisimov, Yaniv Gurwicz, Raanan Y. Rohekar et al.

Causal discovery from observational data is an important tool in many branches of science. Under certain assumptions it allows scientists to explain phenomena, predict, and make decisions. In the large sample limit, sound and complete causal discovery algorithms have been previously introduced, where a directed acyclic graph (DAG), or its equivalence class, representing causal relations is searched. However, in real-world cases, only finite training data is available, which limits the power of statistical tests used by these algorithms, leading to errors in the inferred causal model. This is commonly addressed by devising a strategy for using as few as possible statistical tests. In this paper, we introduce such a strategy in the form of a recursive wrapper for existing constraint-based causal discovery algorithms, which preserves soundness and completeness. It recursively clusters the observed variables using the normalized min-cut criterion from the outset, and uses a baseline causal discovery algorithm during backtracking for learning local sub-graphs. It then combines them and ensures completeness. By an ablation study, using synthetic data, and by common real-world benchmarks, we demonstrate that our approach requires significantly fewer statistical tests, learns more accurate graphs, and requires shorter run-times than the baseline algorithm.

AIDec 14, 2020
A Single Iterative Step for Anytime Causal Discovery

Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov et al.

We present a sound and complete algorithm for recovering causal graphs from observed, non-interventional data, in the possible presence of latent confounders and selection bias. We rely on the causal Markov and faithfulness assumptions and recover the equivalence class of the underlying causal graph by performing a series of conditional independence (CI) tests between observed variables. We propose a single step that is applied iteratively, such that the independence and causal relations entailed from the resulting graph, after any iteration, is correct and becomes more informative with successive iteration. Essentially, we tie the size of the CI condition set to its distance from the tested nodes on the resulting graph. Each iteration refines the skeleton and orientation by performing CI tests having condition sets that are larger than in the preceding iteration. In an iteration, condition sets of CI tests are constructed from nodes that are within a specified search distance, and the sizes of these condition sets is equal to this search distance. The algorithm then iteratively increases the search distance along with the condition set sizes. Thus, each iteration refines a graph, that was recovered by previous iterations having smaller condition sets -- having a higher statistical power. We demonstrate that our algorithm requires significantly fewer CI tests and smaller condition sets compared to the FCI algorithm. This is evident for both recovering the true underlying graph using a perfect CI oracle, and accurately estimating the graph using limited observed data.

MLMay 30, 2019
Modeling Uncertainty by Learning a Hierarchy of Deep Neural Connections

Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov et al.

Modeling uncertainty in deep neural networks, despite recent important advances, is still an open problem. Bayesian neural networks are a powerful solution, where the prior over network weights is a design choice, often a normal distribution or other distribution encouraging sparsity. However, this prior is agnostic to the generative process of the input data, which might lead to unwarranted generalization for out-of-distribution tested data. We suggest the presence of a confounder for the relation between the input data and the discriminative function given the target label. We propose an approach for modeling this confounder by sharing neural connectivity patterns between the generative and discriminative networks. This approach leads to a new deep architecture, where networks are sampled from the posterior of local causal structures, and coupled into a compact hierarchy. We demonstrate that sampling networks from this hierarchy, proportionally to their posterior, is efficient and enables estimating various types of uncertainties. Empirical evaluations of our method demonstrate significant improvement compared to state-of-the-art calibration and out-of-distribution detection methods.

MLSep 13, 2018
Bayesian Structure Learning by Recursive Bootstrap

Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov et al.

We address the problem of Bayesian structure learning for domains with hundreds of variables by employing non-parametric bootstrap, recursively. We propose a method that covers both model averaging and model selection in the same framework. The proposed method deals with the main weakness of constraint-based learning---sensitivity to errors in the independence tests---by a novel way of combining bootstrap with constraint-based learning. Essentially, we provide an algorithm for learning a tree, in which each node represents a scored CPDAG for a subset of variables and the level of the node corresponds to the maximal order of conditional independencies that are encoded in the graph. As higher order independencies are tested in deeper recursive calls, they benefit from more bootstrap samples, and therefore more resistant to the curse-of-dimensionality. Moreover, the re-use of stable low order independencies allows greater computational efficiency. We also provide an algorithm for sampling CPDAGs efficiently from their posterior given the learned tree. We empirically demonstrate that the proposed algorithm scales well to hundreds of variables, and learns better MAP models and more reliable causal relationships between variables, than other state-of-the-art-methods.

MLJun 24, 2018
Constructing Deep Neural Networks by Bayesian Network Structure Learning

Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz et al.

We introduce a principled approach for unsupervised structure learning of deep neural networks. We propose a new interpretation for depth and inter-layer connectivity where conditional independencies in the input distribution are encoded hierarchically in the network structure. Thus, the depth of the network is determined inherently. The proposed method casts the problem of neural network structure learning as a problem of Bayesian network structure learning. Then, instead of directly learning the discriminative structure, it learns a generative graph, constructs its stochastic inverse, and then constructs a discriminative graph. We prove that conditional-dependency relations among the latent variables in the generative graph are preserved in the class-conditional discriminative graph. We demonstrate on image classification benchmarks that the deepest layers (convolutional and dense) of common networks can be replaced by significantly smaller learned structures, while maintaining classification accuracy---state-of-the-art on tested benchmarks. Our structure learning algorithm requires a small computational cost and runs efficiently on a standard desktop CPU.