IV CV LGJul 25, 2021

Lung Cancer Risk Estimation with Incomplete Data: A Joint Missing Imputation Perspective

Riqiang Gao, Yucheng Tang, Kaiwen Xu, Ho Hin Lee, Steve Deppen, Kim Sandler, Pierre Massion, Thomas A. Lasko, Yuankai Huo, Bennett A. Landman

arXiv:2107.11882v110.013 citations

Originality Incremental advance

AI Analysis

This work addresses a critical bottleneck in clinical AI by enabling more accurate lung cancer risk prediction from incomplete multi-modal data, though it is incremental as it builds on existing generative adversarial models.

The paper tackled the problem of missing data in multi-modal clinical prediction by proposing a Conditional PBiGAN method for imputing missing modalities, achieving significant improvements in lung cancer risk estimation with AUC increases of 2.9% and 4.3% on two datasets compared to a baseline.

Data from multi-modality provide complementary information in clinical prediction, but missing data in clinical cohorts limits the number of subjects in multi-modal learning context. Multi-modal missing imputation is challenging with existing methods when 1) the missing data span across heterogeneous modalities (e.g., image vs. non-image); or 2) one modality is largely missing. In this paper, we address imputation of missing data by modeling the joint distribution of multi-modal data. Motivated by partial bidirectional generative adversarial net (PBiGAN), we propose a new Conditional PBiGAN (C-PBiGAN) method that imputes one modality combining the conditional knowledge from another modality. Specifically, C-PBiGAN introduces a conditional latent space in a missing imputation framework that jointly encodes the available multi-modal data, along with a class regularization loss on imputed data to recover discriminative information. To our knowledge, it is the first generative adversarial model that addresses multi-modal missing imputation by modeling the joint distribution of image and non-image data. We validate our model with both the national lung screening trial (NLST) dataset and an external clinical validation cohort. The proposed C-PBiGAN achieves significant improvements in lung cancer risk estimation compared with representative imputation methods (e.g., AUC values increase in both NLST (+2.9\%) and in-house dataset (+4.3\%) compared with PBiGAN, p$<$0.05).

View on arXiv PDF

Similar