GNLGMLNov 27, 2019

Class-Conditional VAE-GAN for Local-Ancestry Simulation

arXiv:1911.13220v126 citations
Originality Synthesis-oriented
AI Analysis

This addresses data scarcity issues in genomics for researchers and practitioners in fields like pharmacogenomics and precision medicine, though it is an incremental application of existing generative models.

The authors tackled the problem of limited reference data for local ancestry inference (LAI) by developing a class-conditional VAE-GAN to generate synthetic human genomic sequences, and they found that training a state-of-the-art LAI method with generated data achieved performance comparable to using real data.

Local ancestry inference (LAI) allows identification of the ancestry of all chromosomal segments in admixed individuals, and it is a critical step in the analysis of human genomes with applications from pharmacogenomics and precision medicine to genome-wide association studies. In recent years, many LAI techniques have been developed in both industry and academic research. However, these methods require large training data sets of human genomic sequences from the ancestries of interest. Such reference data sets are usually limited, proprietary, protected by privacy restrictions, or otherwise not accessible to the public. Techniques to generate training samples that resemble real haploid sequences from ancestries of interest can be useful tools in such scenarios, since a generalized model can often be shared, but the unique human sample sequences cannot. In this work we present a class-conditional VAE-GAN to generate new human genomic sequences that can be used to train local ancestry inference (LAI) algorithms. We evaluate the quality of our generated data by comparing the performance of a state-of-the-art LAI method when trained with generated versus real data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes