Graph Rationalization with Environment-based Augmentations
This work addresses the challenge of limited examples for learning graph rationales in domains like molecule and polymer property prediction, offering an incremental improvement over existing methods.
The paper tackles the problem of identifying optimal graph rationales for improving graph neural network performance by introducing an environment replacement augmentation that creates virtual data examples, resulting in demonstrated effectiveness and efficiency across seven molecular and four polymer datasets.
Rationale is defined as a subset of input features that best explains or supports the prediction by machine learning models. Rationale identification has improved the generalizability and interpretability of neural networks on vision and language data. In graph applications such as molecule and polymer property prediction, identifying representative subgraph structures named as graph rationales plays an essential role in the performance of graph neural networks. Existing graph pooling and/or distribution intervention methods suffer from lack of examples to learn to identify optimal graph rationales. In this work, we introduce a new augmentation operation called environment replacement that automatically creates virtual data examples to improve rationale identification. We propose an efficient framework that performs rationale-environment separation and representation learning on the real and augmented examples in latent spaces to avoid the high complexity of explicit graph decoding and encoding. Comparing against recent techniques, experiments on seven molecular and four polymer real datasets demonstrate the effectiveness and efficiency of the proposed augmentation-based graph rationalization framework.