DC AIOct 27, 2015

Redesigning pattern mining algorithms for supercomputers

Kazuki Yoshizoe, Aika Terada, Koji Tsuda

arXiv:1510.07787v11.26 citations

Originality Incremental advance

AI Analysis

This work addresses the need for efficient parallel pattern mining in high-performance computing, particularly for applications like personal genome analysis, though it is incremental in adapting existing methods to new architectures.

The authors tackled the challenge of adapting pattern mining algorithms for distributed-memory supercomputers, achieving up to a 1175-fold speedup using 1200 cores on a dataset with 11,914 items and 697 transactions.

Upcoming many core processors are expected to employ a distributed memory architecture similar to currently available supercomputers, but parallel pattern mining algorithms amenable to the architecture are not comprehensively studied. We present a novel closed pattern mining algorithm with a well-engineered communication protocol, and generalize it to find statistically significant patterns from personal genome data. For distributing communication evenly, it employs global load balancing with multiple stacks distributed on a set of cores organized as a hypercube with random edges. Our algorithm achieved up to 1175-fold speedup by using 1200 cores for solving a problem with 11,914 items and 697 transactions, while the naive approach of separating the search space failed completely.

View on arXiv PDF

Similar