PanRep: Graph neural networks for extracting universal node embeddings in heterogeneous graphs
This work addresses the need for versatile node representations in graph analysis, offering a pretrained model that can be fine-tuned, though it appears incremental as it builds on existing GNN and unsupervised learning concepts.
The authors tackled the problem of learning universal node embeddings for heterogeneous graphs, introducing PanRep, a GNN model that outperforms unsupervised and some supervised methods in node classification and link prediction, especially with limited labeled data, and identified potential Covid-19 drug candidates in a case study.
Learning unsupervised node embeddings facilitates several downstream tasks such as node classification and link prediction. A node embedding is universal if it is designed to be used by and benefit various downstream tasks. This work introduces PanRep, a graph neural network (GNN) model, for unsupervised learning of universal node representations for heterogenous graphs. PanRep consists of a GNN encoder that obtains node embeddings and four decoders, each capturing different topological and node feature properties. Abiding to these properties the novel unsupervised framework learns universal embeddings applicable to different downstream tasks. PanRep can be furthered fine-tuned to account for possible limited labels. In this operational setting PanRep is considered as a pretrained model for extracting node embeddings of heterogenous graph data. PanRep outperforms all unsupervised and certain supervised methods in node classification and link prediction, especially when the labeled data for the supervised methods is small. PanRep-FT (with fine-tuning) outperforms all other supervised approaches, which corroborates the merits of pretraining models. Finally, we apply PanRep-FT for discovering novel drugs for Covid-19. We showcase the advantage of universal embeddings in drug repurposing and identify several drugs used in clinical trials as possible drug candidates.