LGAug 21, 2022

Heterogeneous Graph Masked Autoencoders

Yijun Tian, Kaiwen Dong, Chunhui Zhang, Chuxu Zhang, Nitesh V. Chawla

arXiv:2208.09957v225.3127 citationsh-index: 75Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of learning representations for heterogeneous graphs, which are common in real-world applications, but it appears incremental as it builds on existing masked autoencoder paradigms for graphs.

The paper tackles the problem of generative self-supervised learning on heterogeneous graphs by proposing HGMAE, a masked autoencoder model that addresses challenges in capturing complex structure, node attributes, and positions, and it outperforms state-of-the-art baselines on multiple datasets.

Generative self-supervised learning (SSL), especially masked autoencoders, has become one of the most exciting learning paradigms and has shown great potential in handling graph data. However, real-world graphs are always heterogeneous, which poses three critical challenges that existing methods ignore: 1) how to capture complex graph structure? 2) how to incorporate various node attributes? and 3) how to encode different node positions? In light of this, we study the problem of generative SSL on heterogeneous graphs and propose HGMAE, a novel heterogeneous graph masked autoencoder model to address these challenges. HGMAE captures comprehensive graph information via two innovative masking techniques and three unique training strategies. In particular, we first develop metapath masking and adaptive attribute masking with dynamic mask rate to enable effective and stable learning on heterogeneous graphs. We then design several training strategies including metapath-based edge reconstruction to adopt complex structural information, target attribute restoration to incorporate various node attributes, and positional feature prediction to encode node positional information. Extensive experiments demonstrate that HGMAE outperforms both contrastive and generative state-of-the-art baselines on several tasks across multiple datasets. Codes are available at https://github.com/meettyj/HGMAE.

View on arXiv PDF Code

Similar