LG AI CRDec 15, 2020

Multi-modal AsynDGAN: Learn From Distributed Medical Image Data without Sharing Private Information

Qi Chang, Zhennan Yan, Lohendran Baskaran, Hui Qu, Yikai Zhang, Tong Zhang, Shaoting Zhang, Dimitris N. Metaxas

arXiv:2012.08604v15.013 citations

Originality Incremental advance

AI Analysis

This framework addresses the critical problem of privacy-preserving collaborative learning for medical image analysis, enabling the use of distributed, sensitive data without direct sharing, which is a significant barrier for medical research.

The paper proposes AsynDGAN, a distributed learning framework that allows a central generator to learn real data distributions from multiple medical image datasets without direct data sharing. This framework can synthesize samples for downstream tasks, achieving performance close to using actual multi-center data, and can also augment data or complete missing modalities for individual data centers.

As deep learning technologies advance, increasingly more data is necessary to generate general and robust models for various tasks. In the medical domain, however, large-scale and multi-parties data training and analyses are infeasible due to the privacy and data security concerns. In this paper, we propose an extendable and elastic learning framework to preserve privacy and security while enabling collaborative learning with efficient communication. The proposed framework is named distributed Asynchronized Discriminator Generative Adversarial Networks (AsynDGAN), which consists of a centralized generator and multiple distributed discriminators. The advantages of our proposed framework are five-fold: 1) the central generator could learn the real data distribution from multiple datasets implicitly without sharing the image data; 2) the framework is applicable for single-modality or multi-modality data; 3) the learned generator can be used to synthesize samples for down-stream learning tasks to achieve close-to-real performance as using actual samples collected from multiple data centers; 4) the synthetic samples can also be used to augment data or complete missing modalities for one single data center; 5) the learning process is more efficient and requires lower bandwidth than other distributed deep learning methods.

View on arXiv PDF

Similar