NANAAug 14, 2012

On Tensors, Sparsity, and Nonnegative Factorizations

arXiv:1112.2414315 citationsh-index: 49
Originality Incremental advance
AI Analysis

For practitioners analyzing sparse count data (e.g., in chemometrics, signal processing), this provides a principled factorization method that better handles zeros than Gaussian-based approaches.

This paper develops a Poisson-based tensor factorization model for sparse count data, introducing the CP-APR algorithm that generalizes Lee-Seung multiplicative updates. The algorithm is proven to converge under mild conditions and demonstrates effectiveness on real and simulated datasets.

Tensors have found application in a variety of fields, ranging from chemometrics to signal processing and beyond. In this paper, we consider the problem of multilinear modeling of sparse count data. Our goal is to develop a descriptive tensor factorization model of such data, along with appropriate algorithms and theory. To do so, we propose that the random variation is best described via a Poisson distribution, which better describes the zeros observed in the data as compared to the typical assumption of a Gaussian distribution. Under a Poisson assumption, we fit a model to observed data using the negative log-likelihood score. We present a new algorithm for Poisson tensor factorization called CANDECOMP-PARAFAC Alternating Poisson Regression (CP-APR) that is based on a majorization-minimization approach. It can be shown that CP-APR is a generalization of the Lee-Seung multiplicative updates. We show how to prevent the algorithm from converging to non-KKT points and prove convergence of CP-APR under mild conditions. We also explain how to implement CP-APR for large-scale sparse tensors and present results on several data sets, both real and simulated.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes