LGMLNov 9, 2022

Flaky Performances when Pretraining on Relational Databases

arXiv:2211.05213v13 citationsh-index: 44
Originality Incremental advance
AI Analysis

This addresses a specific issue in graph representation learning for relational databases, but it is incremental as it builds on existing SSL and GNN techniques.

The paper tackles the problem of negative transfer in graph neural network self-supervised learning when pretraining on relational databases, finding that naive contrastive methods can degrade performance, and proposes InfoNode, a contrastive loss that improves results by maximizing mutual information between node representations.

We explore the downstream task performances for graph neural network (GNN) self-supervised learning (SSL) methods trained on subgraphs extracted from relational databases (RDBs). Intuitively, this joint use of SSL and GNNs should allow to leverage more of the available data, which could translate to better results. However, we found that naively porting contrastive SSL techniques can cause ``negative transfer'': linear evaluation on fixed representations from a pretrained model performs worse than on representations from the randomly-initialized model. Based on the conjecture that contrastive SSL conflicts with the message passing layers of the GNN, we propose InfoNode: a contrastive loss aiming to maximize the mutual information between a node's initial- and final-layer representation. The primary empirical results support our conjecture and the effectiveness of InfoNode.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes