MEMLFeb 24, 2021

Generalised Boosted Forests

arXiv:2102.12561v22 citations
AI Analysis

This is an incremental extension of existing boosting methods for random forests to handle non-Gaussian data, relevant for statisticians and data scientists working with generalized linear models.

The paper tackles the problem of modeling non-Gaussian responses by extending boosted random forests to exponential family distributions, resulting in a method that reduces test-set log-likelihood in simulations and real data.

This paper extends recent work on boosting random forests to model non-Gaussian responses. Given an exponential family $\mathbb{E}[Y|X] = g^{-1}(f(X))$ our goal is to obtain an estimate for $f$. We start with an MLE-type estimate in the link space and then define generalised residuals from it. We use these residuals and some corresponding weights to fit a base random forest and then repeat the same to obtain a boost random forest. We call the sum of these three estimators a \textit{generalised boosted forest}. We show with simulated and real data that both the random forest steps reduces test-set log-likelihood, which we treat as our primary metric. We also provide a variance estimator, which we can obtain with the same computational cost as the original estimate itself. Empirical experiments on real-world data and simulations demonstrate that the methods can effectively reduce bias, and that confidence interval coverage is conservative in the bulk of the covariate distribution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes