DSCCLGSep 28, 2018

Minimization of Gini impurity via connections with the k-means problem

arXiv:1810.00029v12 citations
Originality Incremental advance
AI Analysis

This addresses a theoretical bottleneck in machine learning for decision tree construction, but it is incremental as it builds on known connections with k-means.

The paper tackles the problem of computing partitions with minimum weighted Gini impurity for decision trees, showing it is NP-Complete and proposing new algorithms with provable approximations.

The Gini impurity is one of the measures used to select attribute in Decision Trees/Random Forest construction. In this note we discuss connections between the problem of computing the partition with minimum Weighted Gini impurity and the $k$-means clustering problem. Based on these connections we show that the computation of the partition with minimum Weighted Gini is a NP-Complete problem and we also discuss how to obtain new algorithms with provable approximation for the Gini Minimization problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes