DLAIMay 12

Reconnecting Fragmented Citation Networks with Semantic Augmentation

arXiv:2605.1226349.0
AI Analysis

For researchers using citation networks for bibliometric analysis, this method offers a practical strategy to strengthen citation-based indicators without collapsing disciplinary boundaries.

The paper tackles fragmentation in citation graphs by proposing a hybrid framework that combines citation topology with LLM-based text similarity. Applied to 662,369 Web of Science publications, semantic augmentation substantially reduces fragmentation while preserving disciplinary homogeneity.

Citation graphs are fundamental tools for modeling scientific structure, but are often fragmented due to missing citations of scientifically connected articles. To address this issue, we propose a computationally efficient hybrid framework integrating citation topology with large language model (LLM)-based text similarity. Using 662,369 Web of Science publications in Mathematics and Operations Research & Management Science, we augment the original graph by adding semantic edges from small, disconnected components and weighting existing citations according to textual similarity. Semantic augmentation substantially reduces fragmentation while preserving disciplinary homogeneity. Compared to embedding-only clustering, cluster detection on augmented graphs using the Leiden algorithm retains structural interpretability while offering multi-scale organization. The method scales efficiently to large datasets and offers a practical strategy for strengthening citation-based indicators without collapsing disciplinary boundaries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes