LG SI SOC-PH MLSep 30, 2020

The Role of Isomorphism Classes in Multi-Relational Datasets

Vijja Wichitwechkarn, Ben Day, Cristian Bodnar, Matthew Wales, Pietro Liò

arXiv:2009.14593v11.2

Originality Incremental advance

AI Analysis

This addresses a methodological flaw in evaluating graph neural networks for multi-interaction systems, which is incremental but important for researchers in unsupervised dynamics prediction.

The paper tackles the problem of overestimated performance in neural relational inference models due to isomorphism leakage in synthetic multi-relational datasets, and proposes isomorphism-aware benchmarks that reveal a threshold sampling frequency for successful learning and improve model performance, stability, and training time.

Multi-interaction systems abound in nature, from colloidal suspensions to gene regulatory circuits. These systems can produce complex dynamics and graph neural networks have been proposed as a method to extract underlying interactions and predict how systems will evolve. The current training and evaluation procedures for these models through the use of synthetic multi-relational datasets however are agnostic to interaction network isomorphism classes, which produce identical dynamics up to initial conditions. We extensively analyse how isomorphism class awareness affects these models, focusing on neural relational inference (NRI) models, which are unique in explicitly inferring interactions to predict dynamics in the unsupervised setting. Specifically, we demonstrate that isomorphism leakage overestimates performance in multi-relational inference and that sampling biases present in the multi-interaction network generation process can impair generalisation. To remedy this, we propose isomorphism-aware synthetic benchmarks for model evaluation. We use these benchmarks to test generalisation abilities and demonstrate the existence of a threshold sampling frequency of isomorphism classes for successful learning. In addition, we demonstrate that isomorphism classes can be utilised through a simple prioritisation scheme to improve model performance, stability during training and reduce training time.

View on arXiv PDF

Similar