LG AI DSDec 1, 2022

Fully-Dynamic Decision Trees

Marco Bressan, Gabriel Damay, Mauro Sozio

arXiv:2212.00778v14.64 citationsh-index: 23

Originality Highly original

AI Analysis

This addresses the challenge of efficiently updating decision trees in streaming or changing data environments, which is incremental as it builds on static decision tree methods but introduces the first fully dynamic algorithm.

The paper tackles the problem of maintaining a decision tree dynamically over insertions and deletions of labeled examples, guaranteeing that every node's split is within an additive ε of the optimum Gini gain. The result includes an amortized running time of O(d log³ n / ε²) for real-valued features and O(d log² n / ε) for binary/categorical features, with space O(n d), and shows near-optimality with lower bounds of Ω(d) time and Ω̃(n d) space.

We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given $ε> 0$ our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive $ε$ of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of $O\big(\frac{d \log^3 n}{ε^2}\big)$, which improves to $O\big(\frac{d \log^2 n}ε\big)$ for binary or categorical features, while it uses space $O(n d)$, where $n$ is the maximum number of examples at any point in time and $d$ is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees uses amortized running time $Ω(d)$ and space $\tildeΩ (n d)$. We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.

View on arXiv PDF

Similar