MLAILGMay 29

Correcting Split Selection in Online Decision Trees via Anytime-Valid Inference

arXiv:2605.3123924.9
Predicted impact top 54% in ML · last 90 daysOriginality Incremental advance
AI Analysis

This work provides a principled statistical foundation for online decision trees, which is crucial for researchers and practitioners relying on the validity of split decisions in streaming data applications. It is an incremental improvement to existing methods.

The paper addresses the issue of invalid statistical guarantees in Hoeffding Trees, commonly used in online decision tree ensembles, where data-dependent stopping rules invalidate fixed-sample concentration bounds. The authors propose a new method based on anytime-valid inference, which provides anytime-valid control of false splits and improves performance while producing substantially smaller trees.

Bagging-based ensembles, most notably Adaptive Random Forests, are among the strongest performers for learning from data streams. A common denominator across these methods is their reliance on Hoeffding Trees as base learners, which grow decision trees incrementally by testing whether a candidate split is significantly better than its alternatives using concentration inequalities. Despite their empirical success, existing variants lack valid statistical guarantees. Current analyses rely on fixed-sample concentration bounds, while split decisions are made using data-dependent stopping rules, which invalidates their guarantees and can drive the probabilty of incorrect splits to one. We introduce a principled alternative based on anytime-valid inference. Our method provides: (i) anytime-valid control of false splits under arbitrary data streams, including non-stationary settings; (ii) finite commitment time under a predictive advantage; and (iii) under stationary i.i.d. data, risk is monotone decreasing and strictly improves at every split. Empirically, we evaluate both standalone trees and their use within Adaptive Random Forests on non-stationary streams. Our method improves performance while producing substantially smaller trees.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes