CLAILGJun 5, 2023

Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications

Amazon
arXiv:2306.02592v148 citationsh-index: 35
Originality Incremental advance
AI Analysis

This work addresses the challenge of leveraging textual data in graph mining for various applications, representing an incremental advancement in integrating language and graph models.

The paper tackles the problem of pre-training text plus graph models on large heterogeneous graphs with textual information and fine-tuning them on downstream graph applications with different schemas, proposing a graph-aware language model pre-training (GALM) framework that combines large language models and graph neural networks, and demonstrates its effectiveness through experiments on Amazon's internal and public datasets.

Model pre-training on large text corpora has been demonstrated effective for various downstream applications in the NLP domain. In the graph mining domain, a similar analogy can be drawn for pre-training graph models on large graphs in the hope of benefiting downstream graph applications, which has also been explored by several recent studies. However, no existing study has ever investigated the pre-training of text plus graph models on large heterogeneous graphs with abundant textual information (a.k.a. large graph corpora) and then fine-tuning the model on different related downstream applications with different graph schemas. To address this problem, we propose a framework of graph-aware language model pre-training (GALM) on a large graph corpus, which incorporates large language models and graph neural networks, and a variety of fine-tuning methods on downstream applications. We conduct extensive experiments on Amazon's real internal datasets and large public datasets. Comprehensive empirical results and in-depth analysis demonstrate the effectiveness of our proposed methods along with lessons learned.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes