CVApr 12, 2022Code
DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object DetectionHaibao Yu, Yizhen Luo, Mao Shu et al.
Autonomous driving faces great safety challenges for a lack of global perspective and the limitation of long-range perception capabilities. It has been widely agreed that vehicle-infrastructure cooperation is required to achieve Level 5 autonomy. However, there is still NO dataset from real scenarios available for computer vision researchers to work on vehicle-infrastructure cooperation-related problems. To accelerate computer vision research and innovation for Vehicle-Infrastructure Cooperative Autonomous Driving (VICAD), we release DAIR-V2X Dataset, which is the first large-scale, multi-modality, multi-view dataset from real scenarios for VICAD. DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations. The Vehicle-Infrastructure Cooperative 3D Object Detection problem (VIC3D) is introduced, formulating the problem of collaboratively locating and identifying 3D objects using sensory inputs from both vehicle and infrastructure. In addition to solving traditional 3D object detection problems, the solution of VIC3D needs to consider the temporal asynchrony problem between vehicle and infrastructure sensors and the data transmission cost between them. Furthermore, we propose Time Compensation Late Fusion (TCLF), a late fusion framework for the VIC3D task as a benchmark based on DAIR-V2X. Find data, code, and more up-to-date information at https://thudair.baai.ac.cn/index and https://github.com/AIR-THU/DAIR-V2X.
BMJun 6, 2023Code
MolFM: A Multimodal Molecular Foundation ModelYizhen Luo, Kai Yang, Massimo Hong et al.
Molecular knowledge resides within three different modalities of information sources: molecular structures, biomedical documents, and knowledge bases. Effective incorporation of molecular knowledge from these modalities holds paramount significance in facilitating biomedical research. However, existing multimodal molecular foundation models exhibit limitations in capturing intricate connections between molecular structures and texts, and more importantly, none of them attempt to leverage a wealth of molecular expertise derived from knowledge graphs. In this study, we introduce MolFM, a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs. We propose cross-modal attention between atoms of molecular structures, neighbors of molecule entities and semantically related texts to facilitate cross-modal comprehension. We provide theoretical analysis that our cross-modal pre-training captures local and global molecular knowledge by minimizing the distance in the feature space between different modalities of the same molecule, as well as molecules sharing similar structures or functions. MolFM achieves state-of-the-art performance on various downstream tasks. On cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04% absolute gains under the zero-shot and fine-tuning settings, respectively. Furthermore, qualitative analysis showcases MolFM's implicit ability to provide grounding from molecular substructures and knowledge graphs. Code and models are available on https://github.com/BioFM/OpenBioMed.
QMMay 1
Co-Generative De Novo Functional Protein DesignXinrui Chen, Yizhen Luo, Siqi Fan et al.
De novo functional protein design aims to generate protein sequences that realize specified biochemical functions without relying on evolutionary templates, enabling broad applications in biotechnology and medicine. Existing approaches adopt either direct function-to-sequence mapping or decoupled structure-sequence generation strategies but often fail to achieve functionality and foldability simultaneously. To address this, we propose CodeFP, a Co-generative protein language model for de novo Functional Protein design that simultaneously decodes sequence and structure tokens, thereby enabling superior simultaneous realization of functionality and foldability. CodeFP utilizes functional local structures to enrich functional semantic encodings, overcoming the suboptimal translation of flat encodings into structure tokens, while introducing auxiliary functional supervision to alleviate training ambiguity stemming from the one-to-many structure-to-token mapping. Extensive experiments show that CodeFP consistently achieves average improvements of 6.1% in functional consistency and 3.2% in foldability over the strongest baseline.
LGApr 17, 2023
Towards Unified AI Drug Discovery with Multiple Knowledge ModalitiesYizhen Luo, Xing Yi Liu, Kai Yang et al.
In recent years, AI models that mine intrinsic patterns from molecular structures and protein sequences have shown promise in accelerating drug discovery. However, these methods partly lag behind real-world pharmaceutical approaches of human experts that additionally grasp structured knowledge from knowledge bases and unstructured knowledge from biomedical literature. To bridge this gap, we propose KEDD, a unified, end-to-end, and multimodal deep learning framework that optimally incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first extracts underlying characteristics from heterogeneous inputs, and then applies multimodal fusion for accurate prediction. To mitigate the problem of missing modalities, we leverage multi-head sparse attention and a modality masking mechanism to extract relevant information robustly. Benefiting from integrated knowledge, our framework achieves a deeper understanding of molecule entities, brings significant improvements over state-of-the-art methods on a wide range of tasks and benchmarks, and reveals its promising potential in assisting real-world drug discovery.
LGOct 30, 2024Code
MutaPLM: Protein Language Modeling for Mutation Explanation and EngineeringYizhen Luo, Zikun Nie, Massimo Hong et al.
Studying protein mutations within amino acid sequences holds tremendous significance in life sciences. Protein language models (PLMs) have demonstrated strong capabilities in broad biological applications. However, due to architectural design and lack of supervision, PLMs model mutations implicitly with evolutionary plausibility, which is not satisfactory to serve as explainable and engineerable tools in real-world studies. To address these issues, we present MutaPLM, a unified framework for interpreting and navigating protein mutations with protein language models. MutaPLM introduces a protein delta network that captures explicit protein mutation representations within a unified feature space, and a transfer learning pipeline with a chain-of-thought (CoT) strategy to harvest protein mutation knowledge from biomedical texts. We also construct MutaDescribe, the first large-scale protein mutation dataset with rich textual annotations, which provides cross-modal supervision signals. Through comprehensive experiments, we demonstrate that MutaPLM excels at providing human-understandable explanations for mutational effects and prioritizing novel mutations with desirable properties. Our code, model, and data are open-sourced at https://github.com/PharMolix/MutaPLM.
BMMar 12, 2025Code
PharMolixFM: All-Atom Foundation Models for Molecular Modeling and GenerationYizhen Luo, Jiashuo Wang, Siqi Fan et al.
Structural biology relies on accurate three-dimensional biomolecular structures to advance our understanding of biological functions, disease mechanisms, and therapeutics. While recent advances in deep learning have enabled the development of all-atom foundation models for molecular modeling and generation, existing approaches face challenges in generalization due to the multi-modal nature of atomic data and the lack of comprehensive analysis of training and sampling strategies. To address these limitations, we propose PharMolixFM, a unified framework for constructing all-atom foundation models based on multi-modal generative techniques. Our framework includes three variants using state-of-the-art multi-modal generative models. By formulating molecular tasks as a generalized denoising process with task-specific priors, PharMolixFM achieves robust performance across various structural biology applications. Experimental results demonstrate that PharMolixFM-Diff achieves competitive prediction accuracy in protein-small-molecule docking (83.9% vs. 90.2% RMSD < 2Å, given pocket) with significantly improved inference speed. Moreover, we explore the empirical inference scaling law by introducing more sampling repeats or steps. Our code and model are available at https://github.com/PharMolix/OpenBioMed.
LGJun 14, 2024Code
Learning Multi-view Molecular Representations with Structured and Unstructured KnowledgeYizhen Luo, Kai Yang, Massimo Hong et al.
Capturing molecular knowledge with representation learning approaches holds significant potential in vast scientific fields such as chemistry and life science. An effective and generalizable molecular representation is expected to capture the consensus and complementary molecular expertise from diverse views and perspectives. However, existing works fall short in learning multi-view molecular representations, due to challenges in explicitly incorporating view information and handling molecular knowledge from heterogeneous sources. To address these issues, we present MV-Mol, a molecular representation learning model that harvests multi-view molecular expertise from chemical structures, unstructured knowledge from biomedical texts, and structured knowledge from knowledge graphs. We utilize text prompts to model view information and design a fusion architecture to extract view-based molecular representations. We develop a two-stage pre-training procedure, exploiting heterogeneous data of varying quality and quantity. Through extensive experiments, we show that MV-Mol provides improved representations that substantially benefit molecular property prediction. Additionally, MV-Mol exhibits state-of-the-art performance in multi-modal comprehension of molecular structures and texts. Code and data are available at https://github.com/PharMolix/OpenBioMed.
LGApr 21
Evaluation-driven Scaling for Scientific DiscoveryHaotian Ye, Haowei Lin, Jingyi Tang et al.
Language models are increasingly used in scientific discovery to generate hypotheses, propose candidate solutions, implement systems, and iteratively refine them. At the core of these trial-and-error loops lies evaluation: the process of obtaining feedback on candidate solutions via verifiers, simulators, or task-specific scoring functions. While prior work has highlighted the importance of evaluation, it has not explicitly formulated the problem of how evaluation-driven discovery loops can be scaled up in a principled and effective manner to push the boundaries of scientific discovery, a problem this paper seeks to address. We introduce Simple Test-time Evaluation-driven Scaling (SimpleTES), a general framework that strategically combines parallel exploration, feedback-driven refinement, and local selection, revealing substantial gains unlocked by scaling evaluation-driven discovery loops along the right dimensions. Across 21 scientific problems spanning six domains, SimpleTES discovers state-of-the-art solutions using gpt-oss models, consistently outperforming both frontier-model baselines and sophisticated optimization pipelines. Particularly, we sped up the widely used LASSO algorithm by over 2x, designed quantum circuit routing policies that reduce gate overhead by 24.5%, and discovered new Erdos minimum overlap constructions that surpass the best-known results. Beyond novel discoveries, SimpleTES produces trajectory-level histories that naturally supervise feedback-driven learning. When post-trained on successful trajectories, models not only improve efficiency on seen problems but also generalize to unseen problems, discovering solutions that base models fail to uncover. Together, our results establish effective evaluation-driven loop scaling as a central axis for advancing LLM-driven scientific discovery, and provide a simple yet practical framework for realizing these gains.
GNJul 15, 2025
SToFM: a Multi-scale Foundation Model for Spatial TranscriptomicsSuyuan Zhao, Yizhen Luo, Ganbo Yang et al.
Spatial Transcriptomics (ST) technologies provide biologists with rich insights into single-cell biology by preserving spatial context of cells. Building foundational models for ST can significantly enhance the analysis of vast and complex data sources, unlocking new perspectives on the intricacies of biological tissues. However, modeling ST data is inherently challenging due to the need to extract multi-scale information from tissue slices containing vast numbers of cells. This process requires integrating macro-scale tissue morphology, micro-scale cellular microenvironment, and gene-scale gene expression profile. To address this challenge, we propose SToFM, a multi-scale Spatial Transcriptomics Foundation Model. SToFM first performs multi-scale information extraction on each ST slice, to construct a set of ST sub-slices that aggregate macro-, micro- and gene-scale information. Then an SE(2) Transformer is used to obtain high-quality cell representations from the sub-slices. Additionally, we construct \textbf{SToCorpus-88M}, the largest high-resolution spatial transcriptomics corpus for pretraining. SToFM achieves outstanding performance on a variety of downstream tasks, such as tissue region semantic segmentation and cell type annotation, demonstrating its comprehensive understanding of ST data through capturing and integrating multi-scale information.
AIMar 31
Reinforced Reasoning for End-to-End Retrosynthetic PlanningChenyang Zuo, Siqi Fan, Yizhen Luo et al.
Retrosynthetic planning is a fundamental task in organic chemistry, yet remains challenging due to its combinatorial complexity. To address this, conventional approaches typically rely on hybrid frameworks that combine single-step predictions with external search heuristics, inevitably fracturing the logical coherence between local molecular transformations and global planning objectives. To bridge this gap and embed sophisticated strategic foresight directly into the model's chemical reasoning, we introduce ReTriP, an end-to-end generative framework that reformulates retrosynthesis as a direct Chain-of-Thought reasoning task. We establish a path-coherent molecular representation and employ a progressive training curriculum that transitions from reasoning distillation to reinforcement learning with verifiable rewards, effectively aligning stepwise generation with practical route utility. Empirical evaluation on RetroBench demonstrates that ReTriP achieves state-of-the-art performance, exhibiting superior robustness in long-horizon planning compared to hybrid baselines.
GNMay 9, 2024
LangCell: Language-Cell Pre-training for Cell Identity UnderstandingSuyuan Zhao, Jiahuan Zhang, Yushuai Wu et al.
Cell identity encompasses various semantic aspects of a cell, including cell type, pathway information, disease information, and more, which are essential for biologists to gain insights into its biological characteristics. Understanding cell identity from the transcriptomic data, such as annotating cell types, has become an important task in bioinformatics. As these semantic aspects are determined by human experts, it is impossible for AI models to effectively carry out cell identity understanding tasks without the supervision signals provided by single-cell and label pairs. The single-cell pre-trained language models (PLMs) currently used for this task are trained only on a single modality, transcriptomics data, lack an understanding of cell identity knowledge. As a result, they have to be fine-tuned for downstream tasks and struggle when lacking labeled data with the desired semantic labels. To address this issue, we propose an innovative solution by constructing a unified representation of single-cell data and natural language during the pre-training phase, allowing the model to directly incorporate insights related to cell identity. More specifically, we introduce $\textbf{LangCell}$, the first $\textbf{Lang}$uage-$\textbf{Cell}$ pre-training framework. LangCell utilizes texts enriched with cell identity information to gain a profound comprehension of cross-modal knowledge. Results from experiments conducted on different benchmarks show that LangCell is the only single-cell PLM that can work effectively in zero-shot cell identity understanding scenarios, and also significantly outperforms existing models in few-shot and fine-tuning cell identity understanding scenarios.
AIMar 6, 2024
DeepCRE: Transforming Drug R&D via AI-Driven Cross-drug Response EvaluationYushuai Wu, Ting Zhang, Hao Zhou et al.
The fields of therapeutic application and drug research and development (R&D) both face substantial challenges, i.e., the therapeutic domain calls for more treatment alternatives, while numerous promising pre-clinical drugs have failed in clinical trials. One of the reasons is the inadequacy of Cross-drug Response Evaluation (CRE) during the late stages of drug R&D. Although in-silico CRE models bring a promising solution, existing methodologies are restricted to early stages of drug R&D, such as target and cell-line levels, offering limited improvement to clinical success rates. Herein, we introduce DeepCRE, a pioneering AI model designed to predict CRE effectively in the late stages of drug R&D. DeepCRE outperforms the existing best models by achieving an average performance improvement of 17.7% in patient-level CRE, and a 5-fold increase in indication-level CRE, facilitating more accurate personalized treatment predictions and better pharmaceutical value assessment for indications, respectively. Furthermore, DeepCRE has identified a set of six drug candidates that show significantly greater effectiveness than a comparator set of two approved drugs in 5/8 colorectal cancer organoids. This demonstrates the capability of DeepCRE to systematically uncover a spectrum of drug candidates with enhanced therapeutic effects, highlighting its potential to transform drug R&D.
SIMar 1, 2021
CogDL: A Comprehensive Library for Graph Deep LearningYukuo Cen, Zhenyu Hou, Yan Wang et al.
Graph neural networks (GNNs) have attracted tremendous attention from the graph learning community in recent years. It has been widely adopted in various real-world applications from diverse domains, such as social networks and biological graphs. The research and applications of graph deep learning present new challenges, including the sparse nature of graph data, complicated training of GNNs, and non-standard evaluation of graph tasks. To tackle the issues, we present CogDL, a comprehensive library for graph deep learning that allows researchers and practitioners to conduct experiments, compare methods, and build applications with ease and efficiency. In CogDL, we propose a unified design for the training and evaluation of GNN models for various graph tasks, making it unique among existing graph learning libraries. By utilizing this unified trainer, CogDL can optimize the GNN training loop with several training techniques, such as mixed precision training. Moreover, we develop efficient sparse operators for CogDL, enabling it to become the most competitive graph library for efficiency. Another important CogDL feature is its focus on ease of use with the aim of facilitating open and reproducible research of graph learning. We leverage CogDL to report and maintain benchmark results on fundamental graph tasks, which can be reproduced and directly used by the community.