Continuous Semi-Supervised Nonnegative Matrix Factorization
This is an incremental improvement for researchers in machine learning and text analysis, enhancing topic modeling with regression tasks.
The paper tackles the problem of combining nonnegative matrix factorization with regression on a continuous response variable, showing that this integrated method performs better than performing regression after topic identification while retaining interpretability.
Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In this paper, we show this factorization can be combined with regression on a continuous response variable. In practice, the method performs better than regression done after topics are identified and retrains interpretability.