IRLGApr 15, 2021

Vec2GC -- A Graph Based Clustering Method for Text Representations

arXiv:2104.09439v21 citations
AI Analysis

This addresses the need for improved unsupervised document processing in NLP, though it appears incremental as it builds on existing graph-based and density-based clustering techniques.

The paper tackles the problem of unsupervised clustering for terms or documents in NLP pipelines with limited labeled data by introducing Vec2GC, a density-based method using community detection on weighted graphs from text representations, achieving results such as hierarchical clustering capabilities.

NLP pipelines with limited or no labeled data, rely on unsupervised methods for document processing. Unsupervised approaches typically depend on clustering of terms or documents. In this paper, we introduce a novel clustering algorithm, Vec2GC (Vector to Graph Communities), an end-to-end pipeline to cluster terms or documents for any given text corpus. Our method uses community detection on a weighted graph of the terms or documents, created using text representation learning. Vec2GC clustering algorithm is a density based approach, that supports hierarchical clustering as well.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes