LG AIJul 29, 2024

Noise-Resilient Unsupervised Graph Representation Learning via Multi-Hop Feature Quality Estimation

Shiyuan Li, Yixin Liu, Qingfeng Chen, Geoffrey I. Webb, Shirui Pan

arXiv:2407.19944v117.616 citationsh-index: 12

Originality Incremental advance

AI Analysis

This addresses a practical limitation in graph neural networks for real-world applications where node features are often noisy, though it represents an incremental improvement over existing methods.

The paper tackles the problem of noisy node features in unsupervised graph representation learning, which degrades representation quality, by proposing a multi-hop feature quality estimation method that learns noise-resilient representations through estimating propagated feature quality across different hops, achieving state-of-the-art performance on multiple real-world datasets with diverse noise types.

Unsupervised graph representation learning (UGRL) based on graph neural networks (GNNs), has received increasing attention owing to its efficacy in handling graph-structured data. However, existing UGRL methods ideally assume that the node features are noise-free, which makes them fail to distinguish between useful information and noise when applied to real data with noisy features, thus affecting the quality of learned representations. This urges us to take node noisy features into account in real-world UGRL. With empirical analysis, we reveal that feature propagation, the essential operation in GNNs, acts as a "double-edged sword" in handling noisy features - it can both denoise and diffuse noise, leading to varying feature quality across nodes, even within the same node at different hops. Building on this insight, we propose a novel UGRL method based on Multi-hop feature Quality Estimation (MQE for short). Unlike most UGRL models that directly utilize propagation-based GNNs to generate representations, our approach aims to learn representations through estimating the quality of propagated features at different hops. Specifically, we introduce a Gaussian model that utilizes a learnable "meta-representation" as a condition to estimate the expectation and variance of multi-hop propagated features via neural networks. In this way, the "meta representation" captures the semantic and structural information underlying multiple propagated features but is naturally less susceptible to interference by noise, thereby serving as high-quality node representations beneficial for downstream tasks. Extensive experiments on multiple real-world datasets demonstrate that MQE in learning reliable node representations in scenarios with diverse types of feature noise.

View on arXiv PDF

Similar