Bayesian Nonparametric Boolean Factor Models
This work addresses the need for scalable and flexible probabilistic models in data analysis, though it is incremental as it builds upon existing Boolean factorization methods.
The paper tackles the problem of Boolean matrix and tensor factorization by introducing an Indian Buffet Process prior to allow an unspecified number of latent dimensions, resulting in a computationally efficient inference method that scales to billions of observations and is demonstrated on a real-world dataset with 6 million entries.
We build upon probabilistic models for Boolean Matrix and Boolean Tensor factorisation that have recently been shown to solve these problems with unprecedented accuracy and to enable posterior inference to scale to Billions of observation. Here, we lift the restriction of a pre-specified number of latent dimensions by introducing an Indian Buffet Process prior over factor matrices. Not only does the full factor-conditional take a computationally convenient form due to the logical dependencies in the model, but also the posterior over the number of non-zero latent dimensions is remarkably simple. It amounts to counting the number false and true negative predictions, whereas positive predictions can be ignored. This constitutes a very transparent example of sampling-based posterior inference with an IBP prior and, importantly, lets us maintain extremely efficient inference. We discuss applications to simulated data, as well as to a real world data matrix with 6 Million entries.