LGNov 1, 2021

Multi network InfoMax: A pre-training method involving graph convolutional networks

Usman Mahmood, Zening Fu, Vince Calhoun, Sergey Plis

arXiv:2111.01276v23.11 citations

Originality Incremental advance

AI Analysis

This work addresses the need for fewer labeled samples in neuroimaging classification, offering an incremental improvement over existing pre-training methods.

The paper tackles the problem of limited labeled data in deep learning by introducing a pre-training method that maximizes mutual information between embeddings from convolutional and graph networks, applied to neuroimaging for schizophrenia classification, achieving a 50% reduction in required labeled data for similar performance.

Discovering distinct features and their relations from data can help us uncover valuable knowledge crucial for various tasks, e.g., classification. In neuroimaging, these features could help to understand, classify, and possibly prevent brain disorders. Model introspection of highly performant overparameterized deep learning (DL) models could help find these features and relations. However, to achieve high-performance level DL models require numerous labeled training samples ($n$) rarely available in many fields. This paper presents a pre-training method involving graph convolutional/neural networks (GCNs/GNNs), based on maximizing mutual information between two high-level embeddings of an input sample. Many of the recently proposed pre-training methods pre-train one of many possible networks of an architecture. Since almost every DL model is an ensemble of multiple networks, we take our high-level embeddings from two different networks of a model --a convolutional and a graph network--. The learned high-level graph latent representations help increase performance for downstream graph classification tasks and bypass the need for a high number of labeled data samples. We apply our method to a neuroimaging dataset for classifying subjects into healthy control (HC) and schizophrenia (SZ) groups. Our experiments show that the pre-trained model significantly outperforms the non-pre-trained model and requires $50\%$ less data for similar performance.

View on arXiv PDF

Similar