Markov Boundary Discovery with Ridge Regularized Linear Models
This work addresses the reluctance of investigators to draw causal interpretations from variable selection methods, offering a solution for domains like genomics, though it is incremental as it builds on existing theories.
The paper tackled the problem of causal interpretation in variable selection by showing that modified ridge regularized linear models can approximate a subset of the Markov boundary with a worst-case bound, and experimental results demonstrated competitiveness against state-of-the-art algorithms on gene expression data.
Ridge regularized linear models (RRLMs), such as ridge regression and the SVM, are a popular group of methods that are used in conjunction with coefficient hypothesis testing to discover explanatory variables with a significant multivariate association to a response. However, many investigators are reluctant to draw causal interpretations of the selected variables due to the incomplete knowledge of the capabilities of RRLMs in causal inference. Under reasonable assumptions, we show that a modified form of RRLMs can get very close to identifying a subset of the Markov boundary by providing a worst-case bound on the space of possible solutions. The results hold for any convex loss, even when the underlying functional relationship is nonlinear, and the solution is not unique. Our approach combines ideas in Markov boundary and sufficient dimension reduction theory. Experimental results show that the modified RRLMs are competitive against state-of-the-art algorithms in discovering part of the Markov boundary from gene expression data.