ML LGFeb 8, 2023

Decision trees compensate for model misspecification

Hugh Panton, Gavin Leech, Laurence Aitchison

arXiv:2302.04081v12.3h-index: 25

Originality Incremental advance

AI Analysis

This addresses the problem of balancing interpretability and performance in machine learning for practitioners, but it is incremental as it builds on existing tree model analysis.

The paper investigates why decision trees and gradient boosting machines perform well beyond capturing interactions, finding that part of their success stems from robustness to model misspecification, with experimental confirmation on multiple datasets.

The best-performing models in ML are not interpretable. If we can explain why they outperform, we may be able to replicate these mechanisms and obtain both interpretability and performance. One example are decision trees and their descendent gradient boosting machines (GBMs). These perform well in the presence of complex interactions, with tree depth governing the order of interactions. However, interactions cannot fully account for the depth of trees found in practice. We confirm 5 alternative hypotheses about the role of tree depth in performance in the absence of true interactions, and present results from experiments on a battery of datasets. Part of the success of tree models is due to their robustness to various forms of mis-specification. We present two methods for robust generalized linear models (GLMs) addressing the composite and mixed response scenarios.

View on arXiv PDF

Similar