LGMay 29, 2025

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

DeepMindU of Toronto
arXiv:2505.23579v237 citationsh-index: 30Has Code
Originality Highly original
AI Analysis

This addresses the problem of limited scientific progress in biology due to AI models' struggles with multi-step reasoning and lack of transparent explanations, offering a transformative framework for interpretable, mechanistic AI in biology.

BioReason tackles the challenge of enabling deep, interpretable biological reasoning from genomic data by integrating a DNA foundation model with a large language model, achieving a boost in KEGG-based disease pathway prediction accuracy from 86% to 98% and improving variant effect prediction by an average of 15% over baselines.

Unlocking deep and interpretable biological reasoning from complex genomic data remains a major AI challenge limiting scientific progress. While current DNA foundation models excel at representing sequences, they struggle with multi-step reasoning and lack transparent, biologically meaningful explanations. BioReason addresses this by tightly integrating a DNA foundation model with a large language model (LLM), enabling the LLM to directly interpret and reason over genomic information. Through supervised fine-tuning and reinforcement learning, BioReason learns to produce logical, biologically coherent deductions. It achieves major performance gains, boosting KEGG-based disease pathway prediction accuracy from 86% to 98% and improving variant effect prediction by an average of 15% over strong baselines. BioReason can reason over unseen biological entities and explain its decisions step by step, offering a transformative framework for interpretable, mechanistic AI in biology. All data, code, and checkpoints are available at https://github.com/bowang-lab/BioReason

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes