LGCVJan 26, 2023

Discriminative Entropy Clustering and its Relation to K-means and SVM

arXiv:2301.11405v4h-index: 42
Originality Incremental advance
AI Analysis

This work addresses clustering challenges in machine learning, offering incremental improvements over existing methods.

The paper tackles the problem of unsupervised clustering by proposing a new self-labeling formulation of entropy clustering for general softmax models, which improves state-of-the-art performance on several standard benchmarks for deep clustering.

Maximization of mutual information between the model's input and output is formally related to "decisiveness" and "fairness" of the softmax predictions, motivating these unsupervised entropy-based criteria for clustering. First, in the context of linear softmax models, we discuss some general properties of entropy-based clustering. Disproving some earlier claims, we point out fundamental differences with K-means. On the other hand, we prove the margin maximizing property for decisiveness establishing a relation to SVM-based clustering. Second, we propose a new self-labeling formulation of entropy clustering for general softmax models. The pseudo-labels are introduced as auxiliary variables "splitting" the fairness and decisiveness. The derived self-labeling loss includes the reverse cross-entropy robust to pseudo-label errors and allows an efficient EM solver for pseudo-labels. Our algorithm improves the state of the art on several standard benchmarks for deep clustering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes