Generating Synthetic Citation Networks with Communities

arXiv:2604.2559721.3
Predicted impact top 53% in SI · last 90 daysOriginality Incremental advance
AI Analysis

For researchers benchmarking community detection and network mining algorithms, this work provides a rigorous evaluation framework and a new efficient generator, though the improvements are incremental over existing methods.

The paper presents the first systematic comparison of 12 methods for generating directed, nearly acyclic synthetic citation networks with ground-truth communities, evaluated across 7 real networks and 26 metrics. It introduces the Citation Seeder (CS) algorithm, which achieves competitive results with up to four orders of magnitude fewer parameters than baselines.

Generating realistic synthetic citation, patent, or component dependency networks is essential for benchmarking community detection, graph visualisation, and network data mining algorithms. We present the first systematic comparison of generators of directed graphs that are nearly acyclic and have a ground-truth community structure. We evaluate 12 methods across 7 real citation networks and 26 metrics. We propose the practice of reversing directions of edges in static generators to break cycles and induce a citation-like flow, which significantly improves the performance of a degree-corrected Stochastic Block Model. Our novel methodological approach to evaluating community detection benchmarks distinguishes between endogenous and exogenous mesoscopic similarities, with the latter proving more important. This distinction reveals that high-parameter models suffer from overfitting by memorising planted community statistics which lead to their failing to produce realistic networks. Finally, we introduce the Citation Seeder (CS) algorithm, an iterative generator grounded in the Price-Pareto model of citation networks, with interpretable parameters and O(N+E) runtime. CS achieves competitive results against the best-performing baselines while using up to four orders of magnitude fewer parameters and providing a clean framework for explaining and predicting a network's future growth.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes