LG SI MLJul 11, 2020

M-Evolve: Structural-Mapping-Based Data Augmentation for Graph Classification

Jiajun Zhou, Jie Shen, Shanqing Yu, Guanrong Chen, Qi Xuan

arXiv:2007.05700v47.928 citations

Originality Incremental advance

AI Analysis

This addresses a problem for researchers and practitioners in domains like drug classification and protein analysis by providing an incremental improvement to graph classification models on limited data.

The paper tackles over-fitting and undergeneralization in graph classification on small-scale datasets by introducing a data augmentation framework called M-Evolve, which uses structural mapping methods and model retraining to improve accuracy by 3-13% on benchmark datasets.

Graph classification, which aims to identify the category labels of graphs, plays a significant role in drug classification, toxicity detection, protein analysis etc. However, the limitation of scale in the benchmark datasets makes it easy for graph classification models to fall into over-fitting and undergeneralization. To improve this, we introduce data augmentation on graphs (i.e. graph augmentation) and present four methods:random mapping, vertex-similarity mapping, motif-random mapping and motif-similarity mapping, to generate more weakly labeled data for small-scale benchmark datasets via heuristic transformation of graph structures. Furthermore, we propose a generic model evolution framework, named M-Evolve, which combines graph augmentation, data filtration and model retraining to optimize pre-trained graph classifiers. Experiments on six benchmark datasets demonstrate that the proposed framework helps existing graph classification models alleviate over-fitting and undergeneralization in the training on small-scale benchmark datasets, which successfully yields an average improvement of 3 - 13% accuracy on graph classification tasks.

View on arXiv PDF

Similar