LGMLApr 12, 2019

Learning Optimal Decision Trees from Large Datasets

arXiv:1904.06314v14 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of exact decision tree inference for large datasets, which is incremental as it improves scalability over existing exact methods.

The paper tackles the NP-complete problem of inferring optimal decision trees from large datasets, proposing a novel approach based on incremental Boolean formula generation that scales well with slow runtime growth as dataset size increases.

Inferring a decision tree from a given dataset is one of the classic problems in machine learning. This problem consists of buildings, from a labelled dataset, a tree such that each node corresponds to a class and a path between the tree root and a leaf corresponds to a conjunction of features to be satisfied in this class. Following the principle of parsimony, we want to infer a minimal tree consistent with the dataset. Unfortunately, inferring an optimal decision tree is known to be NP-complete for several definitions of optimality. Hence, the majority of existing approaches relies on heuristics, and as for the few exact inference approaches, they do not work on large data sets. In this paper, we propose a novel approach for inferring a decision tree of a minimum depth based on the incremental generation of Boolean formula. The experimental results indicate that it scales sufficiently well and the time it takes to run grows slowly with the size of dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes