ML LG MEFeb 28, 2017

Finding Statistically Significant Interactions between Continuous Features

arXiv:1702.08694v32.6

Originality Highly original

AI Analysis

This addresses a critical bottleneck in fields like Genetics and Healthcare for researchers analyzing continuous data, offering the first solution for continuous features after prior work focused on binary ones.

The paper tackles the problem of identifying statistically significant higher-order interactions among continuous features, which is computationally challenging due to combinatorial explosion, by proposing an algorithm that uses a lower bound on p-values to prune non-significant interactions and efficiently detects all significant interactions in synthetic and real-world datasets.

The search for higher-order feature interactions that are statistically significantly associated with a class variable is of high relevance in fields such as Genetics or Healthcare, but the combinatorial explosion of the candidate space makes this problem extremely challenging in terms of computational efficiency and proper correction for multiple testing. While recent progress has been made regarding this challenge for binary features, we here present the first solution for continuous features. We propose an algorithm which overcomes the combinatorial explosion of the search space of higher-order interactions by deriving a lower bound on the p-value for each interaction, which enables us to massively prune interactions that can never reach significance and to thereby gain more statistical power. In our experiments, our approach efficiently detects all significant interactions in a variety of synthetic and real-world datasets.

View on arXiv PDF

Similar