Entropy Minimizing Matrix Factorization
This work addresses outlier sensitivity in NMF, a common data analysis technique, offering a robust solution for applications like clustering, but it appears incremental as it builds on existing NMF methods with a new loss function.
The authors tackled the problem of outliers dominating the objective in Nonnegative Matrix Factorization (NMF) by developing an Entropy Minimizing Matrix Factorization (EMMF) framework that minimizes the entropy of residue distribution, allowing a few samples to have large errors and preventing outliers from affecting normal samples, with clustering results on synthetic and real-world datasets demonstrating its effectiveness compared to state-of-the-art methods.
Nonnegative Matrix Factorization (NMF) is a widely-used data analysis technique, and has yielded impressive results in many real-world tasks. Generally, existing NMF methods represent each sample with several centroids, and find the optimal centroids by minimizing the sum of the approximation errors. However, the outliers deviating from the normal data distribution may have large residues, and then dominate the objective value seriously. In this study, an Entropy Minimizing Matrix Factorization framework (EMMF) is developed to tackle the above problem. Considering that the outliers are usually much less than the normal samples, a new entropy loss function is established for matrix factorization, which minimizes the entropy of the residue distribution and allows a few samples to have large approximation errors. In this way, the outliers do not affect the approximation of the normal samples. The multiplicative updating rules for EMMF are also designed, and the convergence is proved both theoretically and experimentally. In addition, a Graph regularized version of EMMF (G-EMMF) is also presented to deal with the complex data structure. Clustering results on various synthetic and real-world datasets demonstrate the reasonableness of the proposed models, and the effectiveness is also verified through the comparison with the state-of-the-arts.