GNLGMay 2

EFGPP: Exploratory framework for genotype-phenotype prediction

arXiv:2605.029540.0
Predicted impact top 98% in GN · last 90 daysOriginality Synthesis-oriented
AI Analysis

Provides a practical proof-of-concept for integrating diverse genetic and clinical data to improve complex trait prediction, but results are incremental and limited by small sample size.

EFGPP is a framework for integrating heterogeneous genetic and clinical data to predict complex traits. Applied to migraine prediction in 733 UK Biobank individuals, combining multiple data types improved AUC from 0.644 (best single type) to 0.688.

Predicting complex human traits from genetic data is challenging because different genetic, clinical, and molecular data sources often contain different parts of the signal. Here, we present EFGPP, a reproducible framework for generating, ranking, and combining multiple types of data for genotype-to-phenotype prediction. We applied EFGPP to migraine prediction using UK Biobank data from 733 individuals. The framework combined genotype-derived features, principal components, clinical and metabolomic covariates, and polygenic risk scores generated from migraine and depression GWAS using PLINK, PRSice-2, AnnoPred, and LDAK-GWAS. The best single data type achieved a test AUC of 0.644, while combining multiple data types improved performance to 0.688 using migraine-focused inputs and 0.663 using cross-trait depression-derived inputs. Genetic features alone did not outperform the covariates-only baseline, but genotype-derived features performed better than PRS alone, and depression-derived PRS showed useful predictive signal. Overall, EFGPP provides a practical proof-of-concept framework for prioritising and integrating heterogeneous genetic data sources for complex phenotype prediction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes