LG DSJan 5, 2021

On the price of explainability for some clustering problems

arXiv:2101.01576v216.032 citations

Originality Incremental advance

AI Analysis

This work is significant for researchers and practitioners in machine learning who need to understand the trade-offs between model performance and interpretability in clustering tasks, providing quantitative bounds and an improved algorithm for explainable k-means.

This paper investigates the unavoidable loss in objective function, termed the price of explainability, when forcing clustering partitions to be explainable via decision trees for k-means, k-medians, k-centers, and maximum-spacing problems. The authors provide improved upper bounds for k-means and k-medians in low dimensions compared to prior work, and introduce a new algorithm for explainable k-means clustering that empirically outperforms current state-of-the-art decision-tree based methods.

The price of explainability for a clustering task can be defined as the unavoidable loss,in terms of the objective function, if we force the final partition to be explainable. Here, we study this price for the following clustering problems: $k$-means, $k$-medians, $k$-centers and maximum-spacing. We provide upper and lower bounds for a natural model where explainability is achieved via decision trees. For the $k$-means and $k$-medians problems our upper bounds improve those obtained by [Moshkovitz et. al, ICML 20] for low dimensions. Another contribution is a simple and efficient algorithm for building explainable clusterings for the $k$-means problem. We provide empirical evidence that its performance is better than the current state of the art for decision-tree based explainable clustering.

View on arXiv PDF

Similar