Active Learning for Decision Trees with Provable Guarantees

Arshia Soltani Moakhar, Tanapoom Laoaron, Faraz Ghahremani, Kiarash Banihashem, MohammadTaghi Hajiaghayi

arXiv:2601.20775v12.71 citationsh-index: 8

Originality Highly original

AI Analysis

This work addresses the theoretical foundations of active learning for decision trees, which is incremental as it builds on existing active learning frameworks but provides new guarantees for this specific classifier type.

This paper tackles the problem of active learning for decision trees by providing the first theoretical analysis of label complexity under specific assumptions, resulting in an algorithm that uses only a polylogarithmic number of label queries and achieves a (1+ε)-approximate classifier with near-optimal error tolerance dependence.

This paper advances the theoretical understanding of active learning label complexity for decision trees as binary classifiers. We make two main contributions. First, we provide the first analysis of the disagreement coefficient for decision trees-a key parameter governing active learning label complexity. Our analysis holds under two natural assumptions required for achieving polylogarithmic label complexity, (i) each root-to-leaf path queries distinct feature dimensions, and (ii) the input data has a regular, grid-like structure. We show these assumptions are essential, as relaxing them leads to polynomial label complexity. Second, we present the first general active learning algorithm for binary classification that achieves a multiplicative error guarantee, producing a $(1+ε)$-approximate classifier. By combining these results, we design an active learning algorithm for decision trees that uses only a polylogarithmic number of label queries in the dataset size, under the stated assumptions. Finally, we establish a label complexity lower bound, showing our algorithm's dependence on the error tolerance $ε$ is close to optimal.

View on arXiv PDF

Similar