Fast and More Powerful Selective Inference for Sparse High-order Interaction Model
This work addresses the problem of selection bias and high dimensionality in interpretable models for domains like medical diagnosis, though it is incremental as it builds on existing parametric programming methods.
The authors tackled the challenge of identifying statistically significant high-order interactions in Sparse High-order Interaction Models (SHIM) for automated high-stake decision-making, extending a parametric programming approach for selective inference with an efficient pruning strategy to improve computational efficiency and statistical power, as demonstrated on synthetic and real data.
Automated high-stake decision-making such as medical diagnosis requires models with high interpretability and reliability. As one of the interpretable and reliable models with good prediction ability, we consider Sparse High-order Interaction Model (SHIM) in this study. However, finding statistically significant high-order interactions is challenging due to the intrinsic high dimensionality of the combinatorial effects. Another problem in data-driven modeling is the effect of "cherry-picking" a.k.a. selection bias. Our main contribution is to extend the recently developed parametric programming approach for selective inference to high-order interaction models. Exhaustive search over the cherry tree (all possible interactions) can be daunting and impractical even for a small-sized problem. We introduced an efficient pruning strategy and demonstrated the computational efficiency and statistical power of the proposed method using both synthetic and real data.