LG DSOct 18, 2022

Towards Practical Explainability with Cluster Descriptors

Xiaoyuan Liu, Ilya Tyagin, Hayato Ushijima-Mwesigwa, Indradeep Ghosh, Ilya Safro

arXiv:2210.10662v23.34 citationsh-index: 20

Originality Incremental advance

AI Analysis

This work addresses the need for practical explainability in clustering for users who require interpretable results, though it is incremental as it builds on previous models with hardware-specific optimizations.

The paper tackles the problem of making clusters more explainable by finding minimal, disjoint tag descriptors for each cluster, which is NP-hard, and proposes a novel model that avoids non-contributing tags, demonstrating its solution on specialized hardware with real datasets like Twitter and PubMed.

With the rapid development of machine learning, improving its explainability has become a crucial research goal. We study the problem of making the clusters more explainable by investigating the cluster descriptors. Given a set of objects $S$, a clustering of these objects $π$, and a set of tags $T$ that have not participated in the clustering algorithm. Each object in $S$ is associated with a subset of $T$. The goal is to find a representative set of tags for each cluster, referred to as the cluster descriptors, with the constraint that these descriptors we find are pairwise disjoint, and the total size of all the descriptors is minimized. In general, this problem is NP-hard. We propose a novel explainability model that reinforces the previous models in such a way that tags that do not contribute to explainability and do not sufficiently distinguish between clusters are not added to the optimal descriptors. The proposed model is formulated as a quadratic unconstrained binary optimization problem which makes it suitable for solving on modern optimization hardware accelerators. We experimentally demonstrate how a proposed explainability model can be solved on specialized hardware for accelerating combinatorial optimization, the Fujitsu Digital Annealer, and use real-life Twitter and PubMed datasets for use cases.

View on arXiv PDF

Similar