Quasi-Bayes meets Vines
This work addresses the problem of efficient Bayesian computation for high-dimensional data, offering a fully non-parametric density estimator with analytical form, which is significant for researchers and practitioners in machine learning and statistics dealing with complex datasets.
The paper tackles the challenge of extending quasi-Bayesian prediction to high dimensions by decomposing the predictive distribution into one-dimensional marginals and a high-dimensional copula, using vine copulas for dependence modeling, and shows that QB-Vine outperforms state-of-the-art methods in density estimation and supervised tasks with high-dimensional data (up to 64 dimensions) using very few samples (around 200).
Recently proposed quasi-Bayesian (QB) methods initiated a new era in Bayesian computation by directly constructing the Bayesian predictive distribution through recursion, removing the need for expensive computations involved in sampling the Bayesian posterior distribution. This has proved to be data-efficient for univariate predictions, but extensions to multiple dimensions rely on a conditional decomposition resulting from predefined assumptions on the kernel of the Dirichlet Process Mixture Model, which is the implicit nonparametric model used. Here, we propose a different way to extend Quasi-Bayesian prediction to high dimensions through the use of Sklar's theorem by decomposing the predictive distribution into one-dimensional predictive marginals and a high-dimensional copula. Thus, we use the efficient recursive QB construction for the one-dimensional marginals and model the dependence using highly expressive vine copulas. Further, we tune hyperparameters using robust divergences (eg. energy score) and show that our proposed Quasi-Bayesian Vine (QB-Vine) is a fully non-parametric density estimator with \emph{an analytical form} and convergence rate independent of the dimension of data in some situations. Our experiments illustrate that the QB-Vine is appropriate for high dimensional distributions ($\sim$64), needs very few samples to train ($\sim$200) and outperforms state-of-the-art methods with analytical forms for density estimation and supervised tasks by a considerable margin.