LGCOMLJan 25, 2025

Reliable Pseudo-labeling via Optimal Transport with Attention for Short Text Clustering

arXiv:2501.15194v3Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of low-discriminative representations in short text clustering for data mining applications, representing an incremental advancement.

The paper tackles the problem of short text clustering by proposing a framework that generates reliable pseudo-labels using optimal transport with attention, resulting in improved performance over state-of-the-art methods as demonstrated in extensive experiments.

Short text clustering has gained significant attention in the data mining community. However, the limited valuable information contained in short texts often leads to low-discriminative representations, increasing the difficulty of clustering. This paper proposes a novel short text clustering framework, called Reliable \textbf{P}seudo-labeling via \textbf{O}ptimal \textbf{T}ransport with \textbf{A}ttention for Short Text Clustering (\textbf{POTA}), that generate reliable pseudo-labels to aid discriminative representation learning for clustering. Specially, \textbf{POTA} first implements an instance-level attention mechanism to capture the semantic relationships among samples, which are then incorporated as a semantic consistency regularization term into an optimal transport problem. By solving this OT problem, we can yield reliable pseudo-labels that simultaneously account for sample-to-sample semantic consistency and sample-to-cluster global structure information. Additionally, the proposed OT can adaptively estimate cluster distributions, making \textbf{POTA} well-suited for varying degrees of imbalanced datasets. Then, we utilize the pseudo-labels to guide contrastive learning to generate discriminative representations and achieve efficient clustering. Extensive experiments demonstrate \textbf{POTA} outperforms state-of-the-art methods. The code is available at: \href{https://github.com/YZH0905/POTA-STC/tree/main}{https://github.com/YZH0905/POTA-STC/tree/main}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes