14.6AIMay 31
Transferring Information Across Interventions in Causal Bayesian OptimizationMohammad Ali Javidian
Bayesian optimization is a popular way to optimize expensive systems, where every experiment, simulation, or intervention costs time or money. In its standard form, it treats the variables we control as plain inputs to a black box and cannot tell apart mere correlation from a real cause and effect. Causal Bayesian optimization closes part of this gap by using a known causal graph together with observational data to decide which variables are worth intervening on. Existing methods, however, learn the effect of each possible intervention almost in isolation, even though in a causal system these effects usually share the same underlying mechanisms. We propose graph-coupled causal Bayesian optimization, which ties the different intervention effects together through the uncertainty we have about a small set of shared causal parameters. The result is a causal kernel that lets evidence collected from one intervention improve our estimate of related interventions. For identifiable linear Gaussian causal models, we show that this kernel has low rank, bounded by the number of shared parameters rather than by the size of the intervention menu. This in turn yields an information-gain bound that grows only logarithmically in the optimization horizon, and a regret bound that cleanly separates three sources of error: optimization, causal estimation, and the choice of which intervention sets to consider. We also describe nonlinear and adaptive extensions. Across theory-aligned Gaussian systems, shared-mechanism stress tests, and standard causal optimization benchmarks, the method keeps the benefits of causal Bayesian optimization while transferring information across related interventions, with the clearest gains when direct interventions on the target's parents are unavailable and sparse interventional data must be reused across a large family of candidate interventions.
LGFeb 16
Extending Multi-Source Bayesian Optimization With Causality PrinciplesLuuk Jacobs, Mohammad Ali Javidian
Multi-Source Bayesian Optimization (MSBO) serves as a variant of the traditional Bayesian Optimization (BO) framework applicable to situations involving optimization of an objective black-box function over multiple information sources such as simulations, surrogate models, or real-world experiments. However, traditional MSBO assumes the input variables of the objective function to be independent and identically distributed, limiting its effectiveness in scenarios where causal information is available and interventions can be performed, such as clinical trials or policy-making. In the single-source domain, Causal Bayesian Optimization (CBO) extends standard BO with the principles of causality, enabling better modeling of variable dependencies. This leads to more accurate optimization, improved decision-making, and more efficient use of low-cost information sources. In this article, we propose a principled integration of the MSBO and CBO methodologies in the multi-source domain, leveraging the strengths of both to enhance optimization efficiency and reduce computational complexity in higher-dimensional problems. We present the theoretical foundations of both Causal and Multi-Source Bayesian Optimization, and demonstrate how their synergy informs our Multi-Source Causal Bayesian Optimization (MSCBO) algorithm. We compare the performance of MSCBO against its foundational counterparts for both synthetic and real-world datasets with varying levels of noise, highlighting the robustness and applicability of MSCBO. Based on our findings, we conclude that integrating MSBO with the causality principles of CBO facilitates dimensionality reduction and lowers operational costs, ultimately improving convergence speed, performance, and scalability.
LGJan 7
Causally-Aware Information Bottleneck for Domain AdaptationMohammad Ali Javidian
We tackle a common domain adaptation setting in causal systems. In this setting, the target variable is observed in the source domain but is entirely missing in the target domain. We aim to impute the target variable in the target domain from the remaining observed variables under various shifts. We frame this as learning a compact, mechanism-stable representation. This representation preserves information relevant for predicting the target while discarding spurious variation. For linear Gaussian causal models, we derive a closed-form Gaussian Information Bottleneck (GIB) solution. This solution reduces to a canonical correlation analysis (CCA)-style projection and offers Directed Acyclic Graph (DAG)-aware options when desired. For nonlinear or non-Gaussian data, we introduce a Variational Information Bottleneck (VIB) encoder-predictor. This approach scales to high dimensions and can be trained on source data and deployed zero-shot to the target domain. Across synthetic and real datasets, our approach consistently attains accurate imputations, supporting practical use in high-dimensional causal models and furnishing a unified, lightweight toolkit for causal domain adaptation.
LGJan 20, 2022
Unicorn: Reasoning about Configurable System Performance through the lens of CausalityMd Shahriar Iqbal, Rahul Krishna, Mohammad Ali Javidian et al.
Modern computer systems are highly configurable, with the total variability space sometimes larger than the number of atoms in the universe. Understanding and reasoning about the performance behavior of highly configurable systems, over a vast and variable space, is challenging. State-of-the-art methods for performance modeling and analyses rely on predictive machine learning models, therefore, they become (i) unreliable in unseen environments (e.g., different hardware, workloads), and (ii) may produce incorrect explanations. To tackle this, we propose a new method, called Unicorn, which (i) captures intricate interactions between configuration options across the software-hardware stack and (ii) describes how such interactions can impact performance variations via causal inference. We evaluated Unicorn on six highly configurable systems, including three on-device machine learning systems, a video encoder, a database management system, and a data analytics pipeline. The experimental results indicate that Unicorn outperforms state-of-the-art performance debugging and optimization methods in finding effective repairs for performance faults and finding configurations with near-optimal performance. Further, unlike the existing methods, the learned causal performance models reliably predict performance for new environments.
QUANT-PHOct 29, 2021
Learning Circular Hidden Quantum Markov Models: A Tensor Network ApproachMohammad Ali Javidian, Vaneet Aggarwal, Zubin Jacob
In this paper, we propose circular Hidden Quantum Markov Models (c-HQMMs), which can be applied for modeling temporal data in quantum datasets (with classical datasets as a special case). We show that c-HQMMs are equivalent to a constrained tensor network (more precisely, circular Local Purified State with positive-semidefinite decomposition) model. This equivalence enables us to provide an efficient learning model for c-HQMMs. The proposed learning approach is evaluated on six real datasets and demonstrates the advantage of c-HQMMs on multiple datasets as compared to HQMMs, circular HMMs, and HMMs.
QUANT-PHApr 24, 2021
Quantum causal inference in the presence of hidden common causes: An entropic approachMohammad Ali Javidian, Vaneet Aggarwal, Zubin Jacob
Quantum causality is an emerging field of study which has the potential to greatly advance our understanding of quantum systems. In this paper, we put forth a theoretical framework for merging quantum information science and causal inference by exploiting entropic principles. For this purpose, we leverage the tradeoff between the entropy of hidden cause and the conditional mutual information of observed variables to develop a scalable algorithmic approach for inferring causality in the presence of latent confounders (common causes) in quantum systems. As an application, we consider a system of three entangled qubits and transmit the second and third qubits over separate noisy quantum channels. In this model, we validate that the first qubit is a latent confounder and the common cause of the second and third qubits. In contrast, when two entangled qubits are prepared and one of them is sent over a noisy channel, there is no common confounder. We also demonstrate that the proposed approach outperforms the results of classical causal inference for the Tubingen database when the variables are classical by exploiting quantum dependence between variables through density matrices rather than joint probability distributions. Thus, the proposed approach unifies classical and quantum causal inference in a principled way.
LGFeb 27, 2021
Scalable Causal Domain AdaptationMohammad Ali Javidian, Om Pandey, Pooyan Jamshidi
One of the most critical problems in transfer learning is the task of domain adaptation, where the goal is to apply an algorithm trained in one or more source domains to a different (but related) target domain. This paper deals with domain adaptation in the presence of covariate shift while invariances exist across domains. One of the main limitations of existing causal inference methods for solving this problem is scalability. To overcome this difficulty, we propose SCTL, an algorithm that avoids an exhaustive search and identifies invariant causal features across source and target domains based on Markov blanket discovery. SCTL does not require having prior knowledge of the causal structure, the type of interventions, or the intervention targets. There is an intrinsic locality associated with SCTL that makes it practically scalable and robust because local causal discovery increases the power of computational independence tests and makes the task of domain adaptation computationally tractable. We show the scalability and robustness of SCTL for domain adaptation using synthetic and real data sets in low-dimensional and high-dimensional settings.
LGFeb 23, 2021
Accelerating Recursive Partition-Based Causal Structure LearningMd. Musfiqur Rahman, Ayman Rasheed, Md. Mosaddek Khan et al.
Causal structure discovery from observational data is fundamental to the causal understanding of autonomous systems such as medical decision support systems, advertising campaigns and self-driving cars. This is essential to solve well-known causal decision making and prediction problems associated with those real-world applications. Recently, recursive causal discovery algorithms have gained particular attention among the research community due to their ability to provide good results by using Conditional Independent (CI) tests in smaller sub-problems. However, each of such algorithms needs a refinement function to remove undesired causal relations of the discovered graphs. Notably, with the increase of the problem size, the computation cost (i.e., the number of CI-tests) of the refinement function makes an algorithm expensive to deploy in practice. This paper proposes a generic causal structure refinement strategy that can locate the undesired relations with a small number of CI-tests, thus speeding up the algorithm for large and complex problems. We theoretically prove the correctness of our algorithm. We then empirically evaluate its performance against the state-of-the-art algorithms in terms of solution quality and completion time in synthetic and real datasets.
SEOct 12, 2020
CADET: Debugging and Fixing Misconfigurations using Counterfactual ReasoningRahul Krishna, Md Shahriar Iqbal, Mohammad Ali Javidian et al.
Modern computing platforms are highly-configurable with thousands of interacting configurations. However, configuring these systems is challenging. Erroneous configurations can cause unexpected non-functional faults. This paper proposes CADET (short for Causal Debugging Toolkit) that enables users to identify, explain, and fix the root cause of non-functional faults early and in a principled fashion. CADET builds a causal model by observing the performance of the system under different configurations. Then, it uses casual path extraction followed by counterfactual reasoning over the causal model to: (a) identify the root causes of non-functional faults, (b) estimate the effects of various configurable parameters on the performance objective(s), and (c) prescribe candidate repairs to the relevant configuration options to fix the non-functional fault. We evaluated CADET on 5 highly-configurable systems deployed on 3 NVIDIA Jetson systems-on-chip. We compare CADET with state-of-the-art configuration optimization and ML-based debugging approaches. The experimental results indicate that CADET can find effective repairs for faults in multiple non-functional properties with (at most) 17% more accuracy, 28% higher gain, and $40\times$ speed-up than other ML-based performance debugging methods. Compared to multi-objective optimization approaches, CADET can find fixes (at most) $9\times$ faster with comparable or better performance gain. Our case study of non-functional faults reported in NVIDIA's forum show that CADET can find $14%$ better repairs than the experts' advice in less than 30 minutes.
LGMay 29, 2020
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachMohammad Ali Javidian, Marco Valtorta, Pooyan Jamshidi
This paper provides a graphical characterization of Markov blankets in chain graphs (CGs) under the Lauritzen-Wermuth-Frydenberg (LWF) interpretation. The characterization is different from the well-known one for Bayesian networks and generalizes it. We provide a novel scalable and sound algorithm for Markov blanket discovery in LWF CGs and prove that the Grow-Shrink algorithm, the IAMB algorithm, and its variants are still correct for Markov blanket discovery in LWF CGs under the same assumptions as for Bayesian networks. We provide a sound and scalable constraint-based framework for learning the structure of LWF CGs from faithful causally sufficient data and prove its correctness when the Markov blanket discovery algorithms in this paper are used. Our proposed algorithms compare positively/competitively against the state-of-the-art LCD (Learn Chain graphs via Decomposition) algorithm, depending on the algorithm that is used for Markov blanket discovery. Our proposed algorithms make a broad range of inference/learning problems computationally tractable and more reliable because they exploit locality.
AIMay 27, 2020
Learning LWF Chain Graphs: an Order Independent AlgorithmMohammad Ali Javidian, Marco Valtorta, Pooyan Jamshidi
LWF chain graphs combine directed acyclic graphs and undirected graphs. We present a PC-like algorithm that finds the structure of chain graphs under the faithfulness assumption to resolve the problem of scalability of the proposed algorithm by Studeny (1997). We prove that our PC-like algorithm is order dependent, in the sense that the output can depend on the order in which the variables are given. This order dependence can be very pronounced in high-dimensional settings. We propose two modifications of the PC-like algorithm that remove part or all of this order dependence. Simulation results under a variety of settings demonstrate the competitive performance of the PC-like algorithms in comparison with the decomposition-based method, called LCD algorithm, proposed by Ma et al. (2008) in low-dimensional settings and improved performance in high-dimensional settings.
AIFeb 24, 2020
AMP Chain Graphs: Minimal Separators and Structure Learning AlgorithmsMohammad Ali Javidian, Marco Valtorta, Pooyan Jamshidi
We address the problem of finding a minimal separator in an Andersson-Madigan-Perlman chain graph (AMP CG), namely, finding a set Z of nodes that separates a given nonadjacent pair of nodes such that no proper subset of Z separates that pair. We analyze several versions of this problem and offer polynomial-time algorithms for each. These include finding a minimal separator from a restricted set of nodes, finding a minimal separator for two given disjoint sets, and testing whether a given separator is minimal. To address the problem of learning the structure of AMP CGs from data, we show that the PC-like algorithm (Pena, 2012) is order-dependent, in the sense that the output can depend on the order in which the variables are given. We propose several modifications of the PC-like algorithm that remove part or all of this order-dependence. We also extend the decomposition-based approach for learning Bayesian networks (BNs) proposed by (Xie et al., 2006) to learn AMP CGs, which include BNs as a special case, under the faithfulness assumption. We prove the correctness of our extension using the minimal separator results. Using standard benchmarks and synthetically generated models and data in our experiments demonstrate the competitive performance of our decomposition-based method, called LCD-AMP, in comparison with the (modified versions of) PC-like algorithm. The LCD-AMP algorithm usually outperforms the PC-like algorithm, and our modifications of the PC-like algorithm learn structures that are more similar to the underlying ground truth graphs than the original PC-like algorithm, especially in high-dimensional settings. In particular, we empirically show that the results of both algorithms are more accurate and stabler when the sample size is reasonably large and the underlying graph is sparse.
MLOct 1, 2019
Order-Independent Structure Learning of Multivariate Regression Chain GraphsMohammad Ali Javidian, Marco Valtorta, Pooyan Jamshidi
This paper deals with multivariate regression chain graphs (MVR CGs), which were introduced by Cox and Wermuth [3,4] to represent linear causal models with correlated errors. We consider the PC-like algorithm for structure learning of MVR CGs, which is a constraint-based method proposed by Sonntag and Peña in [18]. We show that the PC-like algorithm is order-dependent, in the sense that the output can depend on the order in which the variables are given. This order-dependence is a minor issue in low-dimensional settings. However, it can be very pronounced in high-dimensional settings, where it can lead to highly variable results. We propose two modifications of the PC-like algorithm that remove part or all of this order-dependence. Simulations under a variety of settings demonstrate the competitive performance of our algorithms in comparison with the original PC-like algorithm in low-dimensional settings and improved performance in high-dimensional settings.
AIFeb 26, 2019
Transfer Learning for Performance Modeling of Configurable Systems: A Causal AnalysisMohammad Ali Javidian, Pooyan Jamshidi, Marco Valtorta
Modern systems (e.g., deep neural networks, big data analytics, and compilers) are highly configurable, which means they expose different performance behavior under different configurations. The fundamental challenge is that one cannot simply measure all configurations due to the sheer size of the configuration space. Transfer learning has been used to reduce the measurement efforts by transferring knowledge about performance behavior of systems across environments. Previously, research has shown that statistical models are indeed transferable across environments. In this work, we investigate identifiability and transportability of causal effects and statistical relations in highly-configurable systems. Our causal analysis agrees with previous exploratory analysis \cite{Jamshidi17} and confirms that the causal effects of configuration options can be carried over across environments with high confidence. We expect that the ability to carry over causal relations will enable effective performance analysis of highly-configurable systems.
DSNov 20, 2018
On a hypergraph probabilistic graphical modelMohammad Ali Javidian, Linyuan Lu, Marco Valtorta et al.
We propose a directed acyclic hypergraph framework for a probabilistic graphical model that we call Bayesian hypergraphs. The space of directed acyclic hypergraphs is much larger than the space of chain graphs. Hence Bayesian hypergraphs can model much finer factorizations than Bayesian networks or LWF chain graphs and provide simpler and more computationally efficient procedures for factorizations and interventions. Bayesian hypergraphs also allow a modeler to represent causal patterns of interaction such as Noisy-OR graphically (without additional annotations). We introduce global, local and pairwise Markov properties of Bayesian hypergraphs and prove under which conditions they are equivalent. We define a projection operator, called shadow, that maps Bayesian hypergraphs to chain graphs, and show that the Markov properties of a Bayesian hypergraph are equivalent to those of its corresponding chain graph. We extend the causal interpretation of LWF chain graphs to Bayesian hypergraphs and provide corresponding formulas and a graphical criterion for intervention.
AIJun 27, 2018
Comment on: Decomposition of structural learning about directed acyclic graphs [1]Mohammad Ali Javidian, Marco Valtorta
We propose an alternative proof concerning necessary and sufficient conditions to split the problem of searching for d-separators and building the skeleton of a DAG into small problems for every node of a separation tree T. The proof is simpler than the original [1]. The same proof structure has been used in [2] for learning the structure of multivariate regression chain graphs (MVR CGs).
AIJun 25, 2018
A Proof of the Front-Door Adjustment FormulaMohammad Ali Javidian, Marco Valtorta
We provide a proof of the the Front-Door adjustment formula using the do-calculus.
AIJun 3, 2018
Structural Learning of Multivariate Regression Chain Graphs via DecompositionMohammad Ali Javidian, Marco Valtorta
We extend the decomposition approach for learning Bayesian networks (BNs) proposed by (Xie et. al.) to learning multivariate regression chain graphs (MVR CGs), which include BNs as a special case. The same advantages of this decomposition approach hold in the more general setting: reduced complexity and increased power of computational independence tests. Moreover, latent (hidden) variables can be represented in MVR CGs by using bidirected edges, and our algorithm correctly recovers any independence structure that is faithful to an MVR CG, thus greatly extending the range of applications of decomposition-based model selection techniques. Simulations under a variety of settings demonstrate the competitive performance of our method in comparison with the PC-like algorithm (Sonntag and Pena). In fact, the decomposition-based algorithm usually outperforms the PC-like algorithm except in running time. The performance of both algorithms is much better when the underlying graph is sparse.
MEMar 9, 2018
On the Properties of MVR Chain GraphsMohammad Ali Javidian, Marco Valtorta
Depending on the interpretation of the type of edges, a chain graph can represent different relations between variables and thereby independence models. Three interpretations, known by the acronyms LWF, MVR, and AMP, are prevalent. Multivariate regression chain graphs (MVR CGs) were introduced by Cox and Wermuth in 1993. We review Markov properties for MVR chain graphs and propose an alternative global and local Markov property for them. Except for pairwise Markov properties, we show that for MVR chain graphs all Markov properties in the literature are equivalent for semi-graphoids. We derive a new factorization formula for MVR chain graphs which is more explicit than and different from the proposed factorizations for MVR chain graphs in the literature. Finally, we provide a summary table comparing different features of LWF, AMP, and MVR chain graphs.