LGCLMar 24, 2021

Supporting Clustering with Contrastive Learning

arXiv:2103.12953v2751 citations
AI Analysis

This work addresses the problem of poor separation in distance-based clustering for researchers and practitioners, offering a novel integration of contrastive learning, though it is incremental as it builds on existing methods.

The paper tackles the challenge of overlapping categories in unsupervised clustering by proposing SCCL, a framework that uses contrastive learning to improve separation, resulting in 3%-11% improvement in Accuracy and 4%-15% improvement in Normalized Mutual Information on benchmark datasets.

Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space. However, different categories often overlap with each other in the representation space at the beginning of the learning process, which poses a significant challenge for distance-based clustering in achieving good separation between different categories. To this end, we propose Supporting Clustering with Contrastive Learning (SCCL) -- a novel framework to leverage contrastive learning to promote better separation. We assess the performance of SCCL on short text clustering and show that SCCL significantly advances the state-of-the-art results on most benchmark datasets with 3%-11% improvement on Accuracy and 4%-15% improvement on Normalized Mutual Information. Furthermore, our quantitative analysis demonstrates the effectiveness of SCCL in leveraging the strengths of both bottom-up instance discrimination and top-down clustering to achieve better intra-cluster and inter-cluster distances when evaluated with the ground truth cluster labels.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes