LGSep 9, 2024

Re-evaluating the Advancements of Heterophilic Graph Learning

Sitao Luan, Qincheng Lu, Chenqing Hua, Xinyu Wang, Jiaqi Zhu, Xiao-Wen Chang

arXiv:2409.05755v210.46 citationsh-index: 12

Originality Synthesis-oriented

AI Analysis

This work addresses evaluation challenges for researchers in graph machine learning, but it is incremental as it focuses on improving assessment rather than proposing new models.

The paper tackled the problem of evaluating graph neural networks (GNNs) on heterophilic graphs by identifying pitfalls in existing benchmarks and methods, resulting in a taxonomy of datasets and re-evaluated SOTA models with fine-tuned hyperparameters, showing performance variations across different heterophilic groups.

Over the past decade, Graph Neural Networks (GNNs) have achieved great success on machine learning tasks with relational data. However, recent studies have found that heterophily can cause significant performance degradation of GNNs, especially on node-level tasks. Numerous heterophilic benchmark datasets have been put forward to validate the efficacy of heterophily-specific GNNs, and various homophily metrics have been designed to help recognize these challenging datasets. Nevertheless, there still exist multiple pitfalls that severely hinder the proper evaluation of new models and metrics: 1) lack of hyperparameter tuning; 2) insufficient evaluation on the truly challenging heterophilic datasets; 3) missing quantitative evaluation for homophily metrics on synthetic graphs. To overcome these challenges, we first train and fine-tune baseline models on $27$ most widely used benchmark datasets, and categorize them into three distinct groups: malignant, benign and ambiguous heterophilic datasets. We identify malignant and ambiguous heterophily as the truly challenging subsets of tasks, and to our best knowledge, we are the first to propose such taxonomy. Then, we re-evaluate $11$ state-of-the-arts (SOTA) GNNs, covering six popular methods, with fine-tuned hyperparameters on different groups of heterophilic datasets. Based on the model performance, we comprehensively reassess the effectiveness of different methods on heterophily. At last, we evaluate $11$ popular homophily metrics on synthetic graphs with three different graph generation approaches. To overcome the unreliability of observation-based comparison and evaluation, we conduct the first quantitative evaluation and provide detailed analysis.

View on arXiv PDF

Similar