LGSINAOct 18, 2015

Large Enforced Sparse Non-Negative Matrix Factorization

arXiv:1510.05237v11 citations
Originality Synthesis-oriented
AI Analysis

This incremental improvement addresses scalability issues for researchers and practitioners using NMF on large text datasets.

The paper tackles the challenge of applying non-negative matrix factorization (NMF) to large datasets by introducing a method that enforces sparsity in intermediate and output matrices, improving memory and compute performance while preserving or enhancing topic model accuracy and convergence rates.

Non-negative matrix factorization (NMF) is a common method for generating topic models from text data. NMF is widely accepted for producing good results despite its relative simplicity of implementation and ease of computation. One challenge with applying NMF to large datasets is that intermediate matrix products often become dense, stressing the memory and compute elements of a system. In this article, we investigate a simple but powerful modification of a common NMF algorithm that enforces the generation of sparse intermediate and output matrices. This method enables the application of NMF to large datasets through improved memory and compute performance. Further, we demonstrate empirically that this method of enforcing sparsity in the NMF either preserves or improves both the accuracy of the resulting topic model and the convergence rate of the underlying algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes