Model Selection for Nonnegative Matrix Factorization by Support Union Recovery
This addresses a gap in NMF literature for practitioners needing theoretical guarantees in model selection, though it is incremental as it builds on existing NMF frameworks.
The paper tackles the problem of automatic model selection for nonnegative matrix factorization (NMF) by proposing an algorithm that estimates latent dimensionality through support union recovery from empirical moments, provably detecting the true dimensionality under a generative model and showing approximate correctness on synthetic data.
Nonnegative matrix factorization (NMF) has been widely used in machine learning and signal processing because of its non-subtractive, part-based property which enhances interpretability. It is often assumed that the latent dimensionality (or the number of components) is given. Despite the large amount of algorithms designed for NMF, there is little literature about automatic model selection for NMF with theoretical guarantees. In this paper, we propose an algorithm that first calculates an empirical second-order moment from the empirical fourth-order cumulant tensor, and then estimates the latent dimensionality by recovering the support union (the index set of non-zero rows) of a matrix related to the empirical second-order moment. By assuming a generative model of the data with additional mild conditions, our algorithm provably detects the true latent dimensionality. We show on synthetic examples that our proposed algorithm is able to find an approximately correct number of components.