MLFeb 15, 2015

Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing

arXiv:1502.04315v163 citations
Originality Highly original
AI Analysis

This work addresses the challenge of significant pattern mining for researchers and practitioners dealing with large datasets, where previous methods had prohibitive computational costs, representing a strong incremental advance in efficiency.

The paper tackles the problem of detecting statistically significant patterns in two-class datasets by introducing Westfall-Young light, a novel algorithm that corrects for multiple testing and correlations via permutation testing, resulting in dramatic improvements in runtime and memory efficiency over the state-of-the-art on real-world benchmarks.

We present a novel algorithm, Westfall-Young light, for detecting patterns, such as itemsets and subgraphs, which are statistically significantly enriched in one of two classes. Our method corrects rigorously for multiple hypothesis testing and correlations between patterns through the Westfall-Young permutation procedure, which empirically estimates the null distribution of pattern frequencies in each class via permutations. In our experiments, Westfall-Young light dramatically outperforms the current state-of-the-art approach in terms of both runtime and memory efficiency on popular real-world benchmark datasets for pattern mining. The key to this efficiency is that unlike all existing methods, our algorithm neither needs to solve the underlying frequent itemset mining problem anew for each permutation nor needs to store the occurrence list of all frequent patterns. Westfall-Young light opens the door to significant pattern mining on large datasets that previously led to prohibitive runtime or memory costs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes