LGAIFeb 7, 2025

G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion

arXiv:2502.04684v31 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses a fundamental challenge in genetic engineering for applications such as crop breeding and personalized medicine, but it appears incremental as it builds on existing diffusion models and multimodal pretraining approaches.

The paper tackled the problem of cross-species genotype-to-phenotype prediction, which is limited by species-specific models and scarce labeled data, by proposing G2PDiffusion, a diffusion model that generates morphological images from DNA using evolutionary signals like multiple sequence alignments and environmental contexts, resulting in improved understanding of phenotype variability across species.

Understanding how genes influence phenotype across species is a fundamental challenge in genetic engineering, which will facilitate advances in various fields such as crop breeding, conservation biology, and personalized medicine. However, current phenotype prediction models are limited to individual species and expensive phenotype labeling process, making the genotype-to-phenotype prediction a highly domain-dependent and data-scarce problem. To this end, we suggest taking images as morphological proxies, facilitating cross-species generalization through large-scale multimodal pretraining. We propose the first genotype-to-phenotype diffusion model (G2PDiffusion) that generates morphological images from DNA considering two critical evolutionary signals, i.e., multiple sequence alignments (MSA) and environmental contexts. The model contains three novel components: 1) a MSA retrieval engine that identifies conserved and co-evolutionary patterns; 2) an environment-aware MSA conditional encoder that effectively models complex genotype-environment interactions; and 3) an adaptive phenomic alignment module to improve genotype-phenotype consistency. Extensive experiments show that integrating evolutionary signals with environmental context enriches the model's understanding of phenotype variability across species, thereby offering a valuable and promising exploration into advanced AI-assisted genomic analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes