Causal Markov Boundaries
This work addresses the problem of selecting optimal treatments for patients in healthcare settings where randomized trial data is limited, representing an incremental advance in causal feature selection.
The paper tackles feature selection for predicting post-intervention outcomes from pre-intervention variables, particularly in healthcare, by extending Markov boundaries to treatment-outcome pairs and showing that combining observational and experimental data improves feature selection and effect estimation in simulations.
Feature selection is an important problem in machine learning, which aims to select variables that lead to an optimal predictive model. In this paper, we focus on feature selection for post-intervention outcome prediction from pre-intervention variables. We are motivated by healthcare settings, where the goal is often to select the treatment that will maximize a specific patient's outcome; however, we often do not have sufficient randomized control trial data to identify well the conditional treatment effect. We show how we can use observational data to improve feature selection and effect estimation in two cases: (a) using observational data when we know the causal graph, and (b) when we do not know the causal graph but have observational and limited experimental data. Our paper extends the notion of Markov boundary to treatment-outcome pairs. We provide theoretical guarantees for the methods we introduce. In simulated data, we show that combining observational and experimental data improves feature selection and effect estimation.