MLLGSep 16, 2017

Generating Compact Tree Ensembles via Annealing

arXiv:1709.05545v41 citations
Originality Incremental advance
AI Analysis

This work addresses the issue of cyclic behavior and complexity in tree ensemble methods for machine learning practitioners, offering an incremental improvement over existing boosting and Random Forest techniques.

The paper tackles the problem of generating compact and interpretable tree ensembles by introducing a method that grows a large pool of trees in parallel with independent boosting threads, then selects a small subset and optimizes leaf weights via loss optimization. Experiments on real datasets show that the model achieves a smaller loss than boosting, reflected in lower misclassification error on test sets.

Tree ensembles are flexible predictive models that can capture relevant variables and to some extent their interactions in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of boosting or Random Forest. Previous work showed that boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is optimized. At the same time, Random Forest is not based on loss optimization and obtains a more complex and less interpretable model. In this paper we present a novel method for obtaining compact tree ensembles by growing a large pool of trees in parallel with many independent boosting threads and then selecting a small subset and updating their leaf weights by loss optimization. We allow for the trees in the initial pool to have different depths which further helps with generalization. Experiments on real datasets show that the obtained model has usually a smaller loss than boosting, which is also reflected in a lower misclassification error on the test set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes