Multi-Disease Deep Learning Framework for GWAS: Beyond Feature Selection Constraints
This provides a scalable and biologically meaningful method for multi-disease GWAS analysis, addressing limitations in existing deep learning approaches, though it is incremental in its architectural improvements.
The paper tackled the problem of capturing nonlinear genetic interactions in GWAS by developing a deep learning framework that avoids data leakage and leverages shared genetic architecture across multiple diseases, achieving AUC scores from 0.68 to 0.96 on a dataset of five million SNPs and 37,000 samples.
Traditional GWAS has advanced our understanding of complex diseases but often misses nonlinear genetic interactions. Deep learning offers new opportunities to capture complex genomic patterns, yet existing methods mostly depend on feature selection strategies that either constrain analysis to known pathways or risk data leakage when applied across the full dataset. Further, covariates can inflate predictive performance without reflecting true genetic signals. We explore different deep learning architecture choices for GWAS and demonstrate that careful architectural choices can outperform existing methods under strict no-leakage conditions. Building on this, we extend our approach to a multi-label framework that jointly models five diseases, leveraging shared genetic architecture for improved efficiency and discovery. Applied to five million SNPs across 37,000 samples, our method achieves competitive predictive performance (AUC 0.68-0.96), offering a scalable, leakage-free, and biologically meaningful approach for multi-disease GWAS analysis.