Coupled Compound Poisson Factorization
This work addresses the challenge of handling non-random missing data in sparse datasets for applications like clustering and prediction, representing an incremental improvement by integrating existing models into a new framework.
The authors tackled the problem of modeling non-random missing data in extremely sparse datasets by introducing the Coupled Compound Poisson Factorization (CCPF) framework, which couples hierarchical Poisson factorization with arbitrary data-generating models, and demonstrated that explicitly modeling the missing-data mechanism substantially improves results, as shown by increased data log likelihood on held-out test sets in clustering, prediction, and matrix factorization tasks.
We present a general framework, the coupled compound Poisson factorization (CCPF), to capture the missing-data mechanism in extremely sparse data sets by coupling a hierarchical Poisson factorization with an arbitrary data-generating model. We derive a stochastic variational inference algorithm for the resulting model and, as examples of our framework, implement three different data-generating models---a mixture model, linear regression, and factor analysis---to robustly model non-random missing data in the context of clustering, prediction, and matrix factorization. In all three cases, we test our framework against models that ignore the missing-data mechanism on large scale studies with non-random missing data, and we show that explicitly modeling the missing-data mechanism substantially improves the quality of the results, as measured using data log likelihood on a held-out test set.