Stratified-NMF for Heterogeneous Data
This addresses data heterogeneity in NMF for applications with varied data sources, but it is incremental as it builds on existing NMF methods.
The authors tackled the problem of classical NMF not handling heterogeneous data collected at different times or locations by proposing Stratified-NMF, which learns strata-dependent statistics and a shared topics matrix, and they demonstrated its efficiency and accuracy on synthetic and real-world datasets.
Non-negative matrix factorization (NMF) is an important technique for obtaining low dimensional representations of datasets. However, classical NMF does not take into account data that is collected at different times or in different locations, which may exhibit heterogeneity. We resolve this problem by solving a modified NMF objective, Stratified-NMF, that simultaneously learns strata-dependent statistics and a shared topics matrix. We develop multiplicative update rules for this novel objective and prove convergence of the objective. Then, we experiment on synthetic data to demonstrate the efficiency and accuracy of the method. Lastly, we apply our method to three real world datasets and empirically investigate their learned features.