MEMay 5
Copula-Based Endogeneity Correction for Doubly Robust Estimation of Treatment EffectSahil Shikalgar, Md. Noor-E-Alam
Doubly Robust (DR) estimation of treatment effect relies on an untestable assumption that is the absence of unobserved confounding. This assumption is par- ticularly problematic in the context of healthcare research, where variables like pre- scription refill rates serve as proxies for unobserved behaviors such as medication adherence. These proxy variables are often endogenous, exhibiting correlation with the regression error term due to unmeasured confounding or measurement error. We propose a copula-corrected doubly robust estimator that addresses endogeneity in both the treatment and outcome models without requiring instrumental variables. Gaussian copulas model the joint distribution of endogenous covariates and the error term, enabling consistent estimation while preserving the doubly robust property that requires correct specification of either the treatment or outcome model, not both. Monte Carlo simulations demonstrate that naive DR estimation exhibits substantial bias under endogeneity, whereas our corrected estimator recovers unbiased treatment effects across different data-generating processes. We apply our method to examine the effect of nutritional counseling on blood pressure using the National Health and Nutrition Examination Survey (NHANES) data. Naive DR estimation suggests counseling is associated with increased blood pressure. After copula correction, this effect becomes statistically insignificant, consistent with literature showing modest effects of nutri- Counseling in reducing blood pressure. Our methodology provides researchers with a practical tool for obtaining treatment effects in the presence of endogeneity.
MLApr 30
A Novel Computational Framework for Causal Inference: Tree-Based Discretization with ILP-Based MatchingTianyu Yang, Md. Noor-E-Alam
Causal inference is essential for data-driven decision-making, as it aims to uncover causal relationships from observational data. However, identifying causality remains challenging due to the potential for confounding and the distinction between correlation and causation. While recent advances in causal machine learning and matching algorithms have improved estimation accuracy, these methods often face trade-offs between interpretability and computational efficiency. This paper proposes a novel approach that combines a tree-based discretization technique, tailored for causal inference, with an integer linear programming-based matching algorithm. The discretization ensures approximately linear relationships for control datasets within strata, enabling effective matching, while the optimization framework optimizes for global balance. The resulting algorithm yields computational efficiency and less biased ATT estimates compared to state-of-the-art algorithms. Empirical evaluations demonstrate the proposed method's practical advantages over existing techniques in causal inference scenarios.
MEFeb 1, 2025
Optimizing Feature Selection in Causal Inference: A Three-Stage Computational Framework for Unbiased EstimationTianyu Yang, Md. Noor-E-Alam
Feature selection is an important but challenging task in causal inference for obtaining unbiased estimates of causal quantities. Properly selected features in causal inference not only significantly reduce the time required to implement a matching algorithm but, more importantly, can also reduce the bias and variance when estimating causal quantities. When feature selection techniques are applied in causal inference, the crucial criterion is to select variables that, when used for matching, can achieve an unbiased and robust estimation of causal quantities. Recent research suggests that balancing only on treatment-associated variables introduces bias while balancing on spurious variables increases variance. To address this issue, we propose an enhanced three-stage framework that shows a significant improvement in selecting the desired subset of variables compared to the existing state-of-the-art feature selection framework for causal inference, resulting in lower bias and variance in estimating the causal quantity. We evaluated our proposed framework using a state-of-the-art synthetic data across various settings and observed superior performance within a feasible computation time, ensuring scalability for large-scale datasets. Finally, to demonstrate the applicability of our proposed methodology using large-scale real-world data, we evaluated an important US healthcare policy related to the opioid epidemic crisis: whether opioid use disorder has a causal relationship with suicidal behavior.
AIApr 13, 2025
A Two-Stage Interpretable Matching Framework for Causal InferenceSahil Shikalgar, Md. Noor-E-Alam
Matching in causal inference from observational data aims to construct treatment and control groups with similar distributions of covariates, thereby reducing confounding and ensuring an unbiased estimation of treatment effects. This matched sample closely mimics a randomized controlled trial (RCT), thus improving the quality of causal estimates. We introduce a novel Two-stage Interpretable Matching (TIM) framework for transparent and interpretable covariate matching. In the first stage, we perform exact matching across all available covariates. For treatment and control units without an exact match in the first stage, we proceed to the second stage. Here, we iteratively refine the matching process by removing the least significant confounder in each iteration and attempting exact matching on the remaining covariates. We learn a distance metric for the dropped covariates to quantify closeness to the treatment unit(s) within the corresponding strata. We used these high- quality matches to estimate the conditional average treatment effects (CATEs). To validate TIM, we conducted experiments on synthetic datasets with varying association structures and correlations. We assessed its performance by measuring bias in CATE estimation and evaluating multivariate overlap between treatment and control groups before and after matching. Additionally, we apply TIM to a real-world healthcare dataset from the Centers for Disease Control and Prevention (CDC) to estimate the causal effect of high cholesterol on diabetes. Our results demonstrate that TIM improves CATE estimates, increases multivariate overlap, and scales effectively to high-dimensional data, making it a robust tool for causal inference in observational data.
LGNov 27, 2021
A Two-Stage Feature Selection Approach for Robust Evaluation of Treatment Effects in High-Dimensional Observational DataMd Saiful Islam, Sahil Shikalgar, Md. Noor-E-Alam
A Randomized Control Trial (RCT) is considered as the gold standard for evaluating the effect of any intervention or treatment. However, its feasibility is often hindered by ethical, economical, and legal considerations, making observational data a valuable alternative for drawing causal conclusions. Nevertheless, healthcare observational data presents a difficult challenge due to its high dimensionality, requiring careful consideration to ensure unbiased, reliable, and robust causal inferences. To overcome this challenge, in this study, we propose a novel two-stage feature selection technique called, Outcome Adaptive Elastic Net (OAENet), explicitly designed for making robust causal inference decisions using matching techniques. OAENet offers several key advantages over existing methods: superior performance on correlated and high-dimensional data compared to the existing methods and the ability to select specific sets of variables (including confounders and variables associated only with the outcome). This ensures robustness and facilitates an unbiased estimate of the causal effect. Numerical experiments on simulated data demonstrate that OAENet significantly outperforms state-of-the-art methods by either producing a higher-quality estimate or a comparable estimate in significantly less time. To illustrate the applicability of OAENet, we employ large-scale US healthcare data to estimate the effect of Opioid Use Disorder (OUD) on suicidal behavior. When compared to competing methods, OAENet closely aligns with existing literature on the relationship between OUD and suicidal behavior. Performance on both simulated and real-world data highlights that OAENet notably enhances the accuracy of estimating treatment effects or evaluating policy decision-making with causal inference.
OCDec 22, 2020
A Computational Framework for Solving Nonlinear Binary OptimizationProblems in Robust Causal InferenceMd Saiful Islam, Md Sarowar Morshed, Md. Noor-E-Alam
Identifying cause-effect relations among variables is a key step in the decision-making process. While causal inference requires randomized experiments, researchers and policymakers are increasingly using observational studies to test causal hypotheses due to the wide availability of observational data and the infeasibility of experiments. The matching method is the most used technique to make causal inference from observational data. However, the pair assignment process in one-to-one matching creates uncertainty in the inference because of different choices made by the experimenter. Recently, discrete optimization models are proposed to tackle such uncertainty. Although a robust inference is possible with discrete optimization models, they produce nonlinear problems and lack scalability. In this work, we propose greedy algorithms to solve the robust causal inference test instances from observational data with continuous outcomes. We propose a unique framework to reformulate the nonlinear binary optimization problems as feasibility problems. By leveraging the structure of the feasibility formulation, we develop greedy schemes that are efficient in solving robust test problems. In many cases, the proposed algorithms achieve global optimal solutions. We perform experiments on three real-world datasets to demonstrate the effectiveness of the proposed algorithms and compare our result with the state-of-the-art solver. Our experiments show that the proposed algorithms significantly outperform the exact method in terms of computation time while achieving the same conclusion for causal tests. Both numerical experiments and complexity analysis demonstrate that the proposed algorithms ensure the scalability required for harnessing the power of big data in the decision-making process.
AIApr 10, 2019
Resilient Supplier Selection in Logistics 4.0 with Heterogeneous InformationMd Mahmudul Hassan, Dizuo Jiang, A. M. M. Sharif Ullah et al.
Supplier selection problem has gained extensive attention in the prior studies. However, research based on Fuzzy Multi-Attribute Decision Making (F-MADM) approach in ranking resilient suppliers in logistic 4 is still in its infancy. Traditional MADM approach fails to address the resilient supplier selection problem in logistic 4 primarily because of the large amount of data concerning some attributes that are quantitative, yet difficult to process while making decisions. Besides, some qualitative attributes prevalent in logistic 4 entail imprecise perceptual or judgmental decision relevant information, and are substantially different than those considered in traditional suppler selection problems. This study develops a Decision Support System (DSS) that will help the decision maker to incorporate and process such imprecise heterogeneous data in a unified framework to rank a set of resilient suppliers in the logistic 4 environment. The proposed framework induces a triangular fuzzy number from large-scale temporal data using probability-possibility consistency principle. Large number of non-temporal data presented graphically are computed by extracting granular information that are imprecise in nature. Fuzzy linguistic variables are used to map the qualitative attributes. Finally, fuzzy based TOPSIS method is adopted to generate the ranking score of alternative suppliers. These ranking scores are used as input in a Multi-Choice Goal Programming (MCGP) model to determine optimal order allocation for respective suppliers. Finally, a sensitivity analysis assesses how the Suppliers Cost versus Resilience Index (SCRI) changes when differential priorities are set for respective cost and resilience attributes.
AIJun 4, 2018
A Possibility Distribution Based Multi-Criteria Decision Algorithm for Resilient Supplier Selection ProblemsDizuo Jiang, Md Mahmudul Hassan, Tasnim Ibn Faiz et al.
Thus far, limited research has been performed on resilient supplier selection - a problem that requires simultaneous consideration of a set of numerical and linguistic evaluation criteria, which are substantially different from traditional supplier selection problem. Essentially, resilient supplier selection entails key sourcing decision for an organization to gain competitive advantage. In the presence of multiple conflicting evaluation criteria, contradicting decision makers, and imprecise decision relevant information (DRI), this problem becomes even more difficult to solve with the classical optimization approaches. However, prior research focusing on MCDA based supplier selection problem has been lacking in the ability to provide a seamless integration of numerical and linguistic evaluation criteria along with the consideration of multiple decision makers. To address these challenges, we present a comprehensive decision-making framework for ranking a set of suppliers from resiliency perspective. The proposed algorithm is capable of leveraging imprecise and aggregated DRI obtained from crisp numerical assessments and reliability adjusted linguistic appraisals from a group of decision makers. We adapt two popular tools - Single Valued Neutrosophic Sets (SVNS) and Interval-valued fuzzy sets (IVFS), and for the first time extend them to incorporate both crisp and linguistic evaluations in a group decision making platform to obtain aggregated SVNS and IVFS decision matrix. This information is then used to rank the resilient suppliers by using TOPSIS method. We present a case study to illustrate the mechanism of the proposed algorithm.