CLDec 19, 2024

Graph-Convolutional Networks: Named Entity Recognition and Large Language Model Embedding in Document Clustering

arXiv:2412.14867v1h-index: 2

Originality Incremental advance

AI Analysis

This work addresses document clustering for applications like information retrieval, but it is incremental as it builds on existing graph-convolutional networks and LLM techniques.

The paper tackled the problem of document clustering by integrating Named Entity Recognition and Large Language Model embeddings into a graph-based framework, resulting in improved performance over conventional methods, especially for documents with many named entities.

Recent advances in machine learning, particularly Large Language Models (LLMs) such as BERT and GPT, provide rich contextual embeddings that improve text representation. However, current document clustering approaches often ignore the deeper relationships between named entities (NEs) and the potential of LLM embeddings. This paper proposes a novel approach that integrates Named Entity Recognition (NER) and LLM embeddings within a graph-based framework for document clustering. The method builds a graph with nodes representing documents and edges weighted by named entity similarity, optimized using a graph-convolutional network (GCN). This ensures a more effective grouping of semantically related documents. Experimental results indicate that our approach outperforms conventional co-occurrence-based methods in clustering, notably for documents rich in named entities.

View on arXiv PDF

Similar