CLIRNov 25, 2024

Contrastive Multi-graph Learning with Neighbor Hierarchical Sifting for Semi-supervised Text Classification

arXiv:2411.16787v18 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work improves semi-supervised text classification for applications like news categorization, though it is incremental with specific enhancements to existing methods.

The paper tackled the problem of graph contrastive learning for semi-supervised text classification by addressing issues like semantic loss in augmentations, neglect of edge features, and false negatives, resulting in competitive accuracy scores of up to 97.52% on benchmark datasets.

Graph contrastive learning has been successfully applied in text classification due to its remarkable ability for self-supervised node representation learning. However, explicit graph augmentations may lead to a loss of semantics in the contrastive views. Secondly, existing methods tend to overlook edge features and the varying significance of node features during multi-graph learning. Moreover, the contrastive loss suffer from false negatives. To address these limitations, we propose a novel method of contrastive multi-graph learning with neighbor hierarchical sifting for semi-supervised text classification, namely ConNHS. Specifically, we exploit core features to form a multi-relational text graph, enhancing semantic connections among texts. By separating text graphs, we provide diverse views for contrastive learning. Our approach ensures optimal preservation of the graph information, minimizing data loss and distortion. Then, we separately execute relation-aware propagation and cross-graph attention propagation, which effectively leverages the varying correlations between nodes and edge features while harmonising the information fusion across graphs. Subsequently, we present the neighbor hierarchical sifting loss (NHS) to refine the negative selection. For one thing, following the homophily assumption, NHS masks first-order neighbors of the anchor and positives from being negatives. For another, NHS excludes the high-order neighbors analogous to the anchor based on their similarities. Consequently, it effectively reduces the occurrence of false negatives, preventing the expansion of the distance between similar samples in the embedding space. Our experiments on ThuCNews, SogouNews, 20 Newsgroups, and Ohsumed datasets achieved 95.86\%, 97.52\%, 87.43\%, and 70.65\%, which demonstrates competitive results in semi-supervised text classification.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes