QM AIOct 1, 2025

BioVERSE: Representation Alignment of Biomedical Modalities to LLMs for Multi-Modal Reasoning

Ching-Huei Tsou, Michal Ozery-Flato, Ella Barkan, Diwakar Mahajan, Ben Shapira

arXiv:2510.01428v11.22 citationsh-index: 12

Originality Incremental advance

AI Analysis

This work addresses the challenge of integrating diverse biomedical data modalities for enhanced reasoning, offering a practical solution for researchers and clinicians, though it is incremental as it builds on existing models with alignment techniques.

The paper tackles the problem of disjoint embedding spaces between biomedical foundation models and large language models, which limits cross-modal reasoning, by introducing BIOVERSE, a two-stage approach that aligns these modalities through lightweight projection layers and instruction tuning, achieving superior performance in tasks like cell-type annotation and molecular description compared to larger baselines.

Recent advances in large language models (LLMs) and biomedical foundation models (BioFMs) have achieved strong results in biological text reasoning, molecular modeling, and single-cell analysis, yet they remain siloed in disjoint embedding spaces, limiting cross-modal reasoning. We present BIOVERSE (Biomedical Vector Embedding Realignment for Semantic Engagement), a two-stage approach that adapts pretrained BioFMs as modality encoders and aligns them with LLMs through lightweight, modality-specific projection layers. The approach first aligns each modality to a shared LLM space through independently trained projections, allowing them to interoperate naturally, and then applies standard instruction tuning with multi-modal data to bring them together for downstream reasoning. By unifying raw biomedical data with knowledge embedded in LLMs, the approach enables zero-shot annotation, cross-modal question answering, and interactive, explainable dialogue. Across tasks spanning cell-type annotation, molecular description, and protein function reasoning, compact BIOVERSE configurations surpass larger LLM baselines while enabling richer, generative outputs than existing BioFMs, establishing a foundation for principled multi-modal biomedical reasoning.

View on arXiv PDF

Similar