LGMay 30, 2023

Deep Clustering with Incomplete Noisy Pairwise Annotations: A Geometric Regularization Approach

arXiv:2305.19391v18 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the sensitivity to annotation noise in deep constrained clustering, offering a provably robust method for incorporating weak supervision into data clustering tasks.

The paper tackles the problem of deep constrained clustering with noisy pairwise annotations by first analyzing the theoretical properties of a logistic loss function, showing it ensures identifiability of data membership, and then proposing a new geometric regularization-based loss that provably identifies membership even under unknown annotation noise, validated on multiple datasets.

The recent integration of deep learning and pairwise similarity annotation-based constrained clustering -- i.e., $\textit{deep constrained clustering}$ (DCC) -- has proven effective for incorporating weak supervision into massive data clustering: Less than 1% of pair similarity annotations can often substantially enhance the clustering accuracy. However, beyond empirical successes, there is a lack of understanding of DCC. In addition, many DCC paradigms are sensitive to annotation noise, but performance-guaranteed noisy DCC methods have been largely elusive. This work first takes a deep look into a recently emerged logistic loss function of DCC, and characterizes its theoretical properties. Our result shows that the logistic DCC loss ensures the identifiability of data membership under reasonable conditions, which may shed light on its effectiveness in practice. Building upon this understanding, a new loss function based on geometric factor analysis is proposed to fend against noisy annotations. It is shown that even under $\textit{unknown}$ annotation confusions, the data membership can still be $\textit{provably}$ identified under our proposed learning criterion. The proposed approach is tested over multiple datasets to validate our claims.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes