DCAIOct 27, 2015

Redesigning pattern mining algorithms for supercomputers

arXiv:1510.07787v16 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient parallel pattern mining in high-performance computing, particularly for applications like personal genome analysis, though it is incremental in adapting existing methods to new architectures.

The authors tackled the challenge of adapting pattern mining algorithms for distributed-memory supercomputers, achieving up to a 1175-fold speedup using 1200 cores on a dataset with 11,914 items and 697 transactions.

Upcoming many core processors are expected to employ a distributed memory architecture similar to currently available supercomputers, but parallel pattern mining algorithms amenable to the architecture are not comprehensively studied. We present a novel closed pattern mining algorithm with a well-engineered communication protocol, and generalize it to find statistically significant patterns from personal genome data. For distributing communication evenly, it employs global load balancing with multiple stacks distributed on a set of cores organized as a hypercube with random edges. Our algorithm achieved up to 1175-fold speedup by using 1200 cores for solving a problem with 11,914 items and 697 transactions, while the naive approach of separating the search space failed completely.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes