LGDSMLJun 4, 2021

Fuzzy Clustering with Similarity Queries

arXiv:2106.02212v2
Originality Incremental advance
AI Analysis

This addresses the challenge of clustering uncertain or vague datasets for domain experts by making an otherwise hard problem more tractable, though it is incremental as it builds on existing fuzzy clustering methods.

The paper tackles the NP-hard fuzzy k-means clustering problem by introducing a semi-supervised active clustering framework that uses similarity queries to an oracle, resulting in a polynomial-time approximation algorithm with O(poly(k) log n) queries and demonstrated effectiveness on real-world datasets.

The fuzzy or soft $k$-means objective is a popular generalization of the well-known $k$-means problem, extending the clustering capability of the $k$-means to datasets that are uncertain, vague, and otherwise hard to cluster. In this paper, we propose a semi-supervised active clustering framework, where the learner is allowed to interact with an oracle (domain expert), asking for the similarity between a certain set of chosen items. We study the query and computational complexities of clustering in this framework. We prove that having a few of such similarity queries enables one to get a polynomial-time approximation algorithm to an otherwise conjecturally NP-hard problem. In particular, we provide algorithms for fuzzy clustering in this setting that asks $O(\mathsf{poly}(k)\log n)$ similarity queries and run with polynomial-time-complexity, where $n$ is the number of items. The fuzzy $k$-means objective is nonconvex, with $k$-means as a special case, and is equivalent to some other generic nonconvex problem such as non-negative matrix factorization. The ubiquitous Lloyd-type algorithms (or alternating minimization algorithms) can get stuck at a local minimum. Our results show that by making a few similarity queries, the problem becomes easier to solve. Finally, we test our algorithms over real-world datasets, showing their effectiveness in real-world applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes