Detection of Cooperative Interactions in Logistic Regression Models
This addresses the challenge of detecting cooperative interactions in logistic regression models for bioinformatics, offering a method with theoretical guarantees and improved performance over existing approaches, though it is incremental as it builds on known graph-based techniques.
The paper tackles the problem of identifying interactive effects among binary covariates for outcome prediction in bioinformatics by proposing a simple influence measure and algorithm based on maximum-weight spanning trees for acyclic interaction graphs, showing it outperforms generic feature selection algorithms in recovering the interaction graph from i.i.d. samples.
An important problem in the field of bioinformatics is to identify interactive effects among profiled variables for outcome prediction. In this paper, a logistic regression model with pairwise interactions among a set of binary covariates is considered. Modeling the structure of the interactions by a graph, our goal is to recover the interaction graph from independently identically distributed (i.i.d.) samples of the covariates and the outcome. When viewed as a feature selection problem, a simple quantity called influence is proposed as a measure of the marginal effects of the interaction terms on the outcome. For the case when the underlying interaction graph is known to be acyclic, it is shown that a simple algorithm that is based on a maximum-weight spanning tree with respect to the plug-in estimates of the influences not only has strong theoretical performance guarantees, but can also outperform generic feature selection algorithms for recovering the interaction graph from i.i.d. samples of the covariates and the outcome. Our results can also be extended to the model that includes both individual effects and pairwise interactions via the help of an auxiliary covariate.