LGOct 14, 2020

Decision trees as partitioning machines to characterize their generalization properties

arXiv:2010.07374v116 citations
Originality Incremental advance
AI Analysis

This work provides theoretical insights into decision tree generalization, which is incremental but addresses a long-standing gap in understanding for machine learning practitioners.

The paper tackles the problem of weakly bounded generalization properties of decision trees by introducing the concept of partitioning functions to relate to growth functions and VC dimension, resulting in exact VC dimension for decision stumps and an order of N log(Nℓ) for binary trees, with a pruning algorithm outperforming CART on several datasets without cross-validation.

Decision trees are popular machine learning models that are simple to build and easy to interpret. Even though algorithms to learn decision trees date back to almost 50 years, key properties affecting their generalization error are still weakly bounded. Hence, we revisit binary decision trees on real-valued features from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. Using this new concept, we are able to find the exact VC dimension of decision stumps, which is given by the largest integer $d$ such that $2\ell \ge \binom{d}{\left\lfloor\frac{d}{2}\right\rfloor}$, where $\ell$ is the number of real-valued features. We provide a recursive expression to bound the partitioning functions, resulting in a upper bound on the growth function of any decision tree structure. This allows us to show that the VC dimension of a binary tree structure with $N$ internal nodes is of order $N \log(N\ell)$. Finally, we elaborate a pruning algorithm based on these results that performs better than the CART algorithm on a number of datasets, with the advantage that no cross-validation is required.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes