LGAIMLJun 19, 2023

Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

arXiv:2306.10656v51 citationsh-index: 19
Originality Highly original
AI Analysis

This work addresses the challenge of generating realistic virtual human data for healthcare applications, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackled the problem of modeling high-dimensional, sparse healthcare data with heterogeneous types and systematic missingness, resulting in VHGM-MAE, a masked autoencoder that outperformed existing methods in missing value imputation and synthetic data generation.

Virtual Human Generative Model (VHGM) is a generative model that approximates the joint probability over more than 2000 human healthcare-related attributes. This paper presents the core algorithm, VHGM-MAE, a masked autoencoder (MAE) tailored for handling high-dimensional, sparse healthcare data. VHGM-MAE tackles four key technical challenges: (1) heterogeneity of healthcare data types, (2) probability distribution modeling, (3) systematic missingness in the training dataset arising from multiple data sources, and (4) the high-dimensional, small-$n$-large-$p$ problem. To address these challenges, VHGM-MAE employs a likelihood-based approach to model distributions with heterogeneous types, a transformer-based MAE to capture complex dependencies among observed and missing attributes, and a novel training scheme that effectively leverages available samples with diverse missingness patterns to mitigate the small-n-large-p problem. Experimental results demonstrate that VHGM-MAE outperforms existing methods in both missing value imputation and synthetic data generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes