AILGJun 8, 2023

arXiv4TGC: Large-Scale Datasets for Temporal Graph Clustering

arXiv:2306.04962v14 citationsh-index: 51Has Code
Originality Synthesis-oriented
AI Analysis

This addresses a data bottleneck for researchers in temporal graph learning, though it is incremental as it focuses on dataset creation rather than algorithmic innovation.

The paper tackles the lack of large-scale datasets for temporal graph clustering by introducing arXiv4TGC, a set of academic datasets with up to 1.3 million labeled nodes and 10 million temporal edges, which enables clearer evaluation of clustering models.

Temporal graph clustering (TGC) is a crucial task in temporal graph learning. Its focus is on node clustering on temporal graphs, and it offers greater flexibility for large-scale graph structures due to the mechanism of temporal graph methods. However, the development of TGC is currently constrained by a significant problem: the lack of suitable and reliable large-scale temporal graph datasets to evaluate clustering performance. In other words, most existing temporal graph datasets are in small sizes, and even large-scale datasets contain only a limited number of available node labels. It makes evaluating models for large-scale temporal graph clustering challenging. To address this challenge, we build arXiv4TGC, a set of novel academic datasets (including arXivAI, arXivCS, arXivMath, arXivPhy, and arXivLarge) for large-scale temporal graph clustering. In particular, the largest dataset, arXivLarge, contains 1.3 million labeled available nodes and 10 million temporal edges. We further compare the clustering performance with typical temporal graph learning models on both previous classic temporal graph datasets and the new datasets proposed in this paper. The clustering performance on arXiv4TGC can be more apparent for evaluating different models, resulting in higher clustering confidence and more suitable for large-scale temporal graph clustering. The arXiv4TGC datasets are publicly available at: https://github.com/MGitHubL/arXiv4TGC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes