SI LG MLAug 11, 2023

Node Embedding for Homophilous Graphs with ARGEW: Augmentation of Random walks by Graph Edge Weights

Jun Hee Kim, Jaeman Son, Hyunsoo Kim, Eunjo Lee

arXiv:2308.05957v13.33 citationsh-index: 5

Originality Incremental advance

AI Analysis

This addresses a specific issue in graph representation learning for weighted networks, offering an incremental improvement to existing methods.

The paper tackles the problem of node embeddings in weighted homophilous graphs not adequately reflecting edge weights in existing random walk methods, and proposes ARGEW, an augmentation method that improves this pattern and achieves competitive node classification results without node features or labels.

Representing nodes in a network as dense vectors node embeddings is important for understanding a given network and solving many downstream tasks. In particular, for weighted homophilous graphs where similar nodes are connected with larger edge weights, we desire node embeddings where node pairs with strong weights have closer embeddings. Although random walk based node embedding methods like node2vec and node2vec+ do work for weighted networks via including edge weights in the walk transition probabilities, our experiments show that the embedding result does not adequately reflect edge weights. In this paper, we propose ARGEW (Augmentation of Random walks by Graph Edge Weights), a novel augmentation method for random walks that expands the corpus in such a way that nodes with larger edge weights end up with closer embeddings. ARGEW can work with any random walk based node embedding method, because it is independent of the random sampling strategy itself and works on top of the already-performed walks. With several real-world networks, we demonstrate that with ARGEW, compared to not using it, the desired pattern that node pairs with larger edge weights have closer embeddings is much clearer. We also examine ARGEW's performance in node classification: node2vec with ARGEW outperforms pure node2vec and is not sensitive to hyperparameters (i.e. consistently good). In fact, it achieves similarly good results as supervised GCN, even without any node feature or label information during training. Finally, we explain why ARGEW works consistently well by exploring the coappearance distributions using a synthetic graph with clear structural roles.

View on arXiv PDF

Similar