LG CVMar 13, 2023

Collision Cross-entropy for Soft Class Labels and Deep Clustering

arXiv:2303.07321v32.02 citationsh-index: 42

Originality Highly original

AI Analysis

This addresses the challenge of handling ambiguous or uncertain labels in machine learning, particularly for self-labeled clustering methods, offering a novel loss function that enhances performance in these scenarios.

The paper tackles the problem of learning from soft class labels in classification and clustering by proposing collision cross-entropy as a robust alternative to Shannon's cross-entropy, which inhibits generalization when labels are uncertain. It shows that collision cross-entropy improves state-of-the-art results in discriminative deep clustering and speeds up pseudo-label estimation with an efficient EM algorithm.

We propose "collision cross-entropy" as a robust alternative to Shannon's cross-entropy (CE) loss when class labels are represented by soft categorical distributions y. In general, soft labels can naturally represent ambiguous targets in classification. They are particularly relevant for self-labeled clustering methods, where latent pseudo-labels are jointly estimated with the model parameters and uncertainty is prevalent. In case of soft labels, Shannon's CE teaches the model predictions to reproduce the uncertainty in each training example, which inhibits the model's ability to learn and generalize from these examples. As an alternative loss, we propose the negative log of "collision probability" that maximizes the chance of equality between two random variables, predicted class and unknown true class. We show that it has the properties of a generalized CE. The proposed collision CE agrees with Shannon's CE for one-hot labels, but the training from soft labels differs. For example, unlike Shannon's CE, data points where y is a uniform distribution have zero contribution to the training. Collision CE significantly improves classification supervised by soft uncertain targets. Unlike Shannon's, collision CE is symmetric for y and network predictions, which is particularly relevant when both distributions are estimated in the context of self-labeled clustering. Focusing on discriminative deep clustering where self-labeling and entropy-based losses are dominant, we show that the use of collision CE improves the state-of-the-art. We also derive an efficient EM algorithm that significantly speeds up the pseudo-label estimation with collision CE.

View on arXiv PDF

Similar