Identifying Higher-order Combinations of Binary Features
This work addresses a computational bottleneck for researchers analyzing high-dimensional binary data, but it is incremental as it builds on prior methods.
The paper tackled the challenge of scaling the identification of statistically significant interactions between binary variables in high-dimensional datasets by proposing strategies to speed up an existing approach, with one method achieving orders of magnitude faster performance than the state-of-the-art.
Finding statistically significant interactions between binary variables is computationally and statistically challenging in high-dimensional settings, due to the combinatorial explosion in the number of hypotheses. Terada et al. recently showed how to elegantly address this multiple testing problem by excluding non-testable hypotheses. Still, it remains unclear how their approach scales to large datasets. We here proposed strategies to speed up the approach by Terada et al. and evaluate them thoroughly in 11 real-world benchmark datasets. We observe that one approach, incremental search with early stopping, is orders of magnitude faster than the current state-of-the-art approach.