LGNov 28, 2022

Optimal Sparse Regression Trees

Rui Zhang, Rui Xin, Margo Seltzer, Cynthia Rudin

arXiv:2211.14980v313.013 citationsh-index: 60Has Code

Originality Highly original

AI Analysis

This enables efficient generation of interpretable, high-stakes decision models where provable optimality was previously computationally infeasible.

The authors tackled the computational challenge of constructing provably-optimal sparse regression trees by developing a dynamic-programming-with-bounds approach that uses a novel lower bound based on 1D k-Means clustering, achieving optimal solutions in seconds for large, correlated datasets.

Regression trees are one of the oldest forms of AI models, and their predictions can be made without a calculator, which makes them broadly useful, particularly for high-stakes applications. Within the large literature on regression trees, there has been little effort towards full provable optimization, mainly due to the computational hardness of the problem. This work proposes a dynamic-programming-with-bounds approach to the construction of provably-optimal sparse regression trees. We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm in 1-dimension over the set of labels. We are often able to find optimal sparse trees in seconds, even for challenging datasets that involve large numbers of samples and highly-correlated features.

View on arXiv PDF Code

Similar