Revisiting Edge Perturbation for Graph Neural Network in Graph Data Augmentation and Attack
This work addresses a foundational ambiguity in graph machine learning, benefiting researchers and practitioners by enabling more precise and efficient graph data manipulation, though it is incremental in refining existing perturbation methods.
The paper tackles the dual role of edge perturbation in graph neural networks (GNNs), where it can either enhance performance through data augmentation or degrade it via attacks, by proposing a unified formulation and Edge Priority Detector (EPD) to clarify the boundary and enable flexible control, achieving comparable or superior performance with less time overhead.
Edge perturbation is a basic method to modify graph structures. It can be categorized into two veins based on their effects on the performance of graph neural networks (GNNs), i.e., graph data augmentation and attack. Surprisingly, both veins of edge perturbation methods employ the same operations, yet yield opposite effects on GNNs' accuracy. A distinct boundary between these methods in using edge perturbation has never been clearly defined. Consequently, inappropriate perturbations may lead to undesirable outcomes, necessitating precise adjustments to achieve desired effects. Therefore, questions of ``why edge perturbation has a two-faced effect?'' and ``what makes edge perturbation flexible and effective?'' still remain unanswered. In this paper, we will answer these questions by proposing a unified formulation and establishing a clear boundary between two categories of edge perturbation methods. Specifically, we conduct experiments to elucidate the differences and similarities between these methods and theoretically unify the workflow of these methods by casting it to one optimization problem. Then, we devise Edge Priority Detector (EPD) to generate a novel priority metric, bridging these methods up in the workflow. Experiments show that EPD can make augmentation or attack flexibly and achieve comparable or superior performance to other counterparts with less time overhead.