LGDec 4, 2023
The Self-Loop Paradox: Investigating the Impact of Self-Loops on Graph Neural NetworksMoritz Lampert, Ingo Scholtes
Many Graph Neural Networks (GNNs) add self-loops to a graph to include feature information about a node itself at each layer. However, if the GNN consists of more than one layer, this information can return to its origin via cycles in the graph topology. Intuition suggests that this "backflow" of information should be larger in graphs with self-loops compared to graphs without. In this work, we counter this intuition and show that for certain GNN architectures, the information a node gains from itself can be smaller in graphs with self-loops compared to the same graphs without. We adopt an analytical approach for the study of statistical graph ensembles with a given degree sequence and show that this phenomenon, which we call the self-loop paradox, can depend both on the number of GNN layers $k$ and whether $k$ is even or odd. We experimentally validate our theoretical findings in a synthetic node classification task and investigate its practical relevance in 23 real-world graphs.
LGJun 7, 2024
From Link Prediction to Forecasting: Addressing Challenges in Batch-based Temporal Graph LearningMoritz Lampert, Christopher Blöcker, Ingo Scholtes
Dynamic link prediction is an important problem considered in many recent works that propose approaches for learning temporal edge patterns. To assess their efficacy, models are evaluated on continuous-time and discrete-time temporal graph datasets, typically using a traditional batch-oriented evaluation setup. However, as we show in this work, a batch-oriented evaluation is often unsuitable and can cause several issues. Grouping edges into fixed-sized batches regardless of their occurrence time leads to information loss or leakage, depending on the temporal granularity of the data. Furthermore, fixed-size batches create time windows with different durations, resulting in an inconsistent dynamic link prediction task. In this work, we empirically show how traditional batch-based evaluation leads to skewed model performance and hinders the fair comparison of methods. We mitigate this problem by reformulating dynamic link prediction as a link forecasting task that better accounts for temporal information present in the data.