Two to Five Truths in Non-Negative Matrix Factorization
This work addresses the challenge of better topic modeling for text analysis, offering a method that yields substantial improvements but is incremental as it builds on existing spectral graph clustering techniques.
The paper tackles the problem of improving topic modeling quality in non-negative matrix factorization (NMF) by introducing a matrix scaling inspired by the normalized Laplacian, which significantly enhances performance on text datasets. Results show increases of 50% for Twitter data, over 200% for a newsgroup dataset, and over 40% for clean data in terms of adjusted Rand index compared to using counts.
In this paper, we explore the role of matrix scaling on a matrix of counts when building a topic model using non-negative matrix factorization. We present a scaling inspired by the normalized Laplacian (NL) for graphs that can greatly improve the quality of a non-negative matrix factorization. The results parallel those in the spectral graph clustering work of \cite{Priebe:2019}, where the authors proved adjacency spectral embedding (ASE) spectral clustering was more likely to discover core-periphery partitions and Laplacian Spectral Embedding (LSE) was more likely to discover affinity partitions. In text analysis non-negative matrix factorization (NMF) is typically used on a matrix of co-occurrence ``contexts'' and ``terms" counts. The matrix scaling inspired by LSE gives significant improvement for text topic models in a variety of datasets. We illustrate the dramatic difference a matrix scalings in NMF can greatly improve the quality of a topic model on three datasets where human annotation is available. Using the adjusted Rand index (ARI), a measure cluster similarity we see an increase of 50\% for Twitter data and over 200\% for a newsgroup dataset versus using counts, which is the analogue of ASE. For clean data, such as those from the Document Understanding Conference, NL gives over 40\% improvement over ASE. We conclude with some analysis of this phenomenon and some connections of this scaling with other matrix scaling methods.