LGOct 1, 2022

Heterogeneous Graph Contrastive Multi-view Learning

Zehong Wang, Qi Li, Donghua Yu, Xiaolong Han, Xiao-Zhi Gao, Shigen Shen

arXiv:2210.00248v214.153 citationsh-index: 15Has Code

Originality Highly original

AI Analysis

This work addresses the underdeveloped area of graph contrastive learning for heterogeneous networks, which is important for tasks like node classification and link prediction in domains with complex relational data.

The paper tackled the problem of applying contrastive learning to heterogeneous information networks by addressing challenges in augmentation, contrastive objective design, and sampling bias, resulting in a model that consistently outperforms state-of-the-art baselines on five benchmark datasets.

Inspired by the success of contrastive learning (CL) in computer vision and natural language processing, graph contrastive learning (GCL) has been developed to learn discriminative node representations on graph datasets. However, the development of GCL on Heterogeneous Information Networks (HINs) is still in the infant stage. For example, it is unclear how to augment the HINs without substantially altering the underlying semantics, and how to design the contrastive objective to fully capture the rich semantics. Moreover, early investigations demonstrate that CL suffers from sampling bias, whereas conventional debiasing techniques are empirically shown to be inadequate for GCL. How to mitigate the sampling bias for heterogeneous GCL is another important problem. To address the aforementioned challenges, we propose a novel Heterogeneous Graph Contrastive Multi-view Learning (HGCML) model. In particular, we use metapaths as the augmentation to generate multiple subgraphs as multi-views, and propose a contrastive objective to maximize the mutual information between any pairs of metapath-induced views. To alleviate the sampling bias, we further propose a positive sampling strategy to explicitly select positives for each node via jointly considering semantic and structural information preserved on each metapath view. Extensive experiments demonstrate HGCML consistently outperforms state-of-the-art baselines on five real-world benchmark datasets.

View on arXiv PDF Code

Similar