Minimum description length as an objective function for non-negative matrix factorization
This work addresses the challenge of parameter selection in NMF for researchers and practitioners in data analysis, offering a principled alternative to ad-hoc sparsity constraints, though it is incremental as it builds on existing NMF methods.
The paper tackled the problem of selecting parameters and imposing sparsity constraints in non-negative matrix factorization (NMF) by proposing a novel objective function based on the minimum description length (MDL) principle, which automatically balances model complexity and accuracy without extensive parameter tuning, and demonstrated its effectiveness on three heterogeneous datasets and semi-synthetic data.
Non-negative matrix factorization (NMF) is a dimensionality reduction technique which tends to produce a sparse representation of data. Commonly, the error between the actual and recreated matrices is used as an objective function, but this method may not produce the type of representation we desire as it allows for the complexity of the model to grow, constrained only by the size of the subspace and the non-negativity requirement. If additional constraints, such as sparsity, are imposed the question of parameter selection becomes critical. Instead of adding sparsity constraints in an ad-hoc manner we propose a novel objective function created by using the principle of minimum description length (MDL). Our formulation, MDL-NMF, automatically trades off between the complexity and accuracy of the model using a principled approach with little parameter selection or the need for domain expertise. We demonstrate our model works effectively on three heterogeneous data-sets and on a range of semi-synthetic data showing the broad applicability of our method.