LG MLOct 23, 2025

DiNo and RanBu: Lightweight Predictions from Shallow Random Forests

Tiago Mendonça dos Santos, Rafael Izbicki, Luís Gustavo Esteves

arXiv:2510.23624v1h-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses deployment challenges in latency-sensitive or resource-constrained environments for tabular prediction tasks, representing an incremental improvement over existing random forest methods.

The paper tackles the high inference latency and memory demands of deep random forests by introducing DiNo and RanBu, two shallow-forest methods that convert depth-limited trees into efficient predictors, with RanBu matching or exceeding full-depth forest accuracy while reducing training plus inference time by up to 95% on benchmarks.

Random Forest ensembles are a strong baseline for tabular prediction tasks, but their reliance on hundreds of deep trees often results in high inference latency and memory demands, limiting deployment in latency-sensitive or resource-constrained environments. We introduce DiNo (Distance with Nodes) and RanBu (Random Bushes), two shallow-forest methods that convert a small set of depth-limited trees into efficient, distance-weighted predictors. DiNo measures cophenetic distances via the most recent common ancestor of observation pairs, while RanBu applies kernel smoothing to Breiman's classical proximity measure. Both approaches operate entirely after forest training: no additional trees are grown, and tuning of the single bandwidth parameter $h$ requires only lightweight matrix-vector operations. Across three synthetic benchmarks and 25 public datasets, RanBu matches or exceeds the accuracy of full-depth random forests-particularly in high-noise settings-while reducing training plus inference time by up to 95\%. DiNo achieves the best bias-variance trade-off in low-noise regimes at a modest computational cost. Both methods extend directly to quantile regression, maintaining accuracy with substantial speed gains. The implementation is available as an open-source R/C++ package at https://github.com/tiagomendonca/dirf. We focus on structured tabular random samples (i.i.d.), leaving extensions to other modalities for future work.

View on arXiv PDF Code

Similar