LGFeb 14, 2023

Scalable Optimal Multiway-Split Decision Trees with Constraints

arXiv:2302.06812v18.86 citationsh-index: 12

Originality Highly original

AI Analysis

This addresses the problem of scaling optimal decision tree learning for practical applications with large datasets and complex constraints, though it is incremental as it builds on prior MIP-based methods.

The paper tackles the scalability limitations of existing mixed-integer programming (MIP) methods for learning optimal decision trees by proposing a novel path-based formulation that reduces variable count and a column generation framework, achieving up to a 24X runtime reduction and handling datasets with over 1 million samples.

There has been a surge of interest in learning optimal decision trees using mixed-integer programs (MIP) in recent years, as heuristic-based methods do not guarantee optimality and find it challenging to incorporate constraints that are critical for many practical applications. However, existing MIP methods that build on an arc-based formulation do not scale well as the number of binary variables is in the order of $\mathcal{O}(2^dN)$, where $d$ and $N$ refer to the depth of the tree and the size of the dataset. Moreover, they can only handle sample-level constraints and linear metrics. In this paper, we propose a novel path-based MIP formulation where the number of decision variables is independent of $N$. We present a scalable column generation framework to solve the MIP optimally. Our framework produces a multiway-split tree which is more interpretable than the typical binary-split trees due to its shorter rules. Our method can handle nonlinear metrics such as F1 score and incorporate a broader class of constraints. We demonstrate its efficacy with extensive experiments. We present results on datasets containing up to 1,008,372 samples while existing MIP-based decision tree models do not scale well on data beyond a few thousand points. We report superior or competitive results compared to the state-of-art MIP-based methods with up to a 24X reduction in runtime.

View on arXiv PDF

Similar