LGMLMay 11, 2025

Improving Random Forests by Smoothing

arXiv:2505.06852v13 citationsh-index: 2
Originality Incremental advance
AI Analysis

This is an incremental improvement for machine learning practitioners using random forests in small data scenarios.

The paper tackled the problem of random forests performing poorly in small data regimes due to lack of smoothness, by applying kernel-based smoothing to improve predictive performance and uncertainty quantification, resulting in consistent improvements in test cases.

Gaussian process regression is a popular model in the small data regime due to its sound uncertainty quantification and the exploitation of the smoothness of the regression function that is encountered in a wide range of practical problems. However, Gaussian processes perform sub-optimally when the degree of smoothness is non-homogeneous across the input domain. Random forest regression partially addresses this issue by providing local basis functions of variable support set sizes that are chosen in a data-driven way. However, they do so at the expense of forgoing any degree of smoothness, which often results in poor performance in the small data regime. Here, we aim to combine the advantages of both models by applying a kernel-based smoothing mechanism to a learned random forest or any other piecewise constant prediction function. As we demonstrate empirically, the resulting model consistently improves the predictive performance of the underlying random forests and, in almost all test cases, also improves the log loss of the usual uncertainty quantification based on inter-tree variance. The latter advantage can be attributed to the ability of the smoothing model to take into account the uncertainty over the exact tree-splitting locations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes