Provable guarantees for decision tree induction: the agnostic setting
This addresses the theoretical foundation for widely used decision tree algorithms, offering guarantees in realistic scenarios where data may not be perfectly separable, which is incremental but important for machine learning practitioners.
The paper tackles the problem of providing provable guarantees for top-down decision tree learning heuristics in the agnostic setting, showing they can construct a tree of size s^(tilde{O}((log s)/ε^2)) achieving error ≤ opt_s + ε, with a near-matching lower bound of s^(tilde{Ω}(log s)).
We give strengthened provable guarantees on the performance of widely employed and empirically successful {\sl top-down decision tree learning heuristics}. While prior works have focused on the realizable setting, we consider the more realistic and challenging {\sl agnostic} setting. We show that for all monotone functions~$f$ and parameters $s\in \mathbb{N}$, these heuristics construct a decision tree of size $s^{\tilde{O}((\log s)/\varepsilon^2)}$ that achieves error $\le \mathsf{opt}_s + \varepsilon$, where $\mathsf{opt}_s$ denotes the error of the optimal size-$s$ decision tree for $f$. Previously, such a guarantee was not known to be achievable by any algorithm, even one that is not based on top-down heuristics. We complement our algorithmic guarantee with a near-matching $s^{\tildeΩ(\log s)}$ lower bound.