PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization
This work addresses the compute-intensive nature of NMF applications like topic modeling and bioinformatics, offering incremental improvements in parallel efficiency for researchers and practitioners in these domains.
The paper tackled the performance bottleneck in parallel Non-negative Matrix Factorization (NMF) by developing a new algorithm, PL-NMF, that optimizes data locality, resulting in significant performance improvements over existing state-of-the-art parallel NMF algorithms on multi-core CPUs and GPUs.
Non-negative Matrix Factorization (NMF) is a key kernel for unsupervised dimension reduction used in a wide range of applications, including topic modeling, recommender systems and bioinformatics. Due to the compute-intensive nature of applications that must perform repeated NMF, several parallel implementations have been developed in the past. However, existing parallel NMF algorithms have not addressed data locality optimizations, which are critical for high performance since data movement costs greatly exceed the cost of arithmetic/logic operations on current computer systems. In this paper, we devise a parallel NMF algorithm based on the HALS (Hierarchical Alternating Least Squares) scheme that incorporates algorithmic transformations to enhance data locality. Efficient realizations of the algorithm on multi-core CPUs and GPUs are developed, demonstrating significant performance improvement over existing state-of-the-art parallel NMF algorithms.