Memory-Efficient Sequential Pattern Mining with Hybrid Tries
This addresses a critical memory efficiency problem for researchers and practitioners in knowledge discovery dealing with large sequential data sets, representing a strong specific gain rather than a foundational breakthrough.
The paper tackled the memory bottleneck in Sequential Pattern Mining by developing a hybrid trie data structure and algorithm, achieving an average 85% reduction in memory consumption and 49% improvement in computation time on test instances, and enabling processing of large data sets within 256GB of memory.
This paper develops a memory-efficient approach for Sequential Pattern Mining (SPM), a fundamental topic in knowledge discovery that faces a well-known memory bottleneck for large data sets. Our methodology involves a novel hybrid trie data structure that exploits recurring patterns to compactly store the data set in memory; and a corresponding mining algorithm designed to effectively extract patterns from this compact representation. Numerical results on small to medium-sized real-life test instances show an average improvement of 85% in memory consumption and 49% in computation time compared to the state of the art. For large data sets, our algorithm stands out as the only capable SPM approach within 256GB of system memory, potentially saving 1.7TB in memory consumption.