MLLGJun 11, 2015

Mondrian Forests for Large-Scale Regression when Uncertainty Matters

arXiv:1506.03805v460 citations
Originality Incremental advance
AI Analysis

This work addresses the need for scalable uncertainty estimation in regression for applications like Bayesian optimization, though it builds incrementally on existing Mondrian forest methods.

The authors tackled the problem of obtaining high-quality uncertainty estimates in large-scale regression tasks, where standard decision forests lack uncertainty quantification and Gaussian processes face scalability issues. They extended Mondrian forests to regression with a hierarchical Gaussian prior, achieving better-calibrated uncertainty assessments and outperforming approximate GPs on large-scale datasets.

Many real-world regression problems demand a measure of the uncertainty associated with each prediction. Standard decision forests deliver efficient state-of-the-art predictive performance, but high-quality uncertainty estimates are lacking. Gaussian processes (GPs) deliver uncertainty estimates, but scaling GPs to large-scale data sets comes at the cost of approximating the uncertainty estimates. We extend Mondrian forests, first proposed by Lakshminarayanan et al. (2014) for classification problems, to the large-scale non-parametric regression setting. Using a novel hierarchical Gaussian prior that dovetails with the Mondrian forest framework, we obtain principled uncertainty estimates, while still retaining the computational advantages of decision forests. Through a combination of illustrative examples, real-world large-scale datasets, and Bayesian optimization benchmarks, we demonstrate that Mondrian forests outperform approximate GPs on large-scale regression tasks and deliver better-calibrated uncertainty assessments than decision-forest-based methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes