LGMar 17, 2023

Data-Centric Learning from Unlabeled Graphs with Diffusion Model

arXiv:2303.10108v230 citationsh-index: 37
Originality Highly original
AI Analysis

This work addresses the challenge of limited labeled data in graph learning for researchers and practitioners, offering a novel augmentation method that outperforms self-supervised learning approaches.

The paper tackles the problem of graph property prediction by proposing a data-centric approach that uses a diffusion model to generate task-specific labeled examples from unlabeled graphs, achieving significant performance improvements over fifteen existing methods on fifteen tasks.

Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels. Experiments demonstrate that our data-centric approach performs significantly better than fifteen existing various methods on fifteen tasks. The performance improvement brought by unlabeled data is visible as the generated labeled examples unlike the self-supervised learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes