100.0CEMay 1Code
D-Flow: Multi-modality Flow Matching for D-peptide DesignFang Wu, Shuting Jin, Xiangru Tang et al.
Among these, D-peptides are resistant to proteolysis, exhibit greater in vivo stability, and are easier to synthesize. Despite advances in deep learning for peptide discovery, the scarcity of natural D-protein data limits the transfer of existing generative models to the D-peptide chemical space. We propose D-Flow, a full-atom flow-based framework for de novo D-peptide design. Conditioned on receptor binding, D-Flow uses structural representations incorporating backbone frames, side-chain angles, and discrete amino acid types. A mirror-image algorithm is implemented to address the lack of training data for D-proteins by converting the chirality of L-receptors. Furthermore, we enhance D-Flow's capacity by integrating protein language models (PLMs) with structural awareness through a lightweight structural adapter that injects structural representations into PLM embeddings. This enables D-Flow to learn conformational priors in the D-peptide chemical space and to accommodate the chiral selectivity of binding sites, thereby mitigating the scarcity of D-peptide data. A two-stage training pipeline and a control toolkit enable D-Flow to transition from general protein design to targeted binder design while preserving pre-training knowledge. Results on the PepMerge benchmark show D-Flow's effectiveness. D-peptides generated by D-Flow align more closely with native sequences and structures, with sequence identity improving by 10.2% over the best baseline, and the top affinity score reaching 24.31%. Overall, D-Flow shows potential for D-peptide design, facilitating the development of bioorthogonal and stable molecular tools and diagnostics. Code is available at https://github.com/smiles724/PeptideDesign.
LGApr 8, 2023Code
Instructor-inspired Machine Learning for Robust Molecular Property PredictionFang Wu, Shuting Jin, Siyuan Li et al.
Machine learning catalyzes a revolution in chemical and biological science. However, its efficacy heavily depends on the availability of labeled data, and annotating biochemical data is extremely laborious. To surmount this data sparsity challenge, we present an instructive learning algorithm named InstructMol to measure pseudo-labels' reliability and help the target model leverage large-scale unlabeled data. InstructMol does not require transferring knowledge between multiple domains, which avoids the potential gap between the pretraining and fine-tuning stages. We demonstrated the high accuracy of InstructMol on several real-world molecular datasets and out-of-distribution (OOD) benchmarks. Code is available at~ https://github.com/smiles724/InstructMol.
LGJan 8
Surface-based Molecular Design with Multi-modal Flow MatchingFang Wu, Zhengyuan Zhou, Shuting Jin et al.
Therapeutic peptides show promise in targeting previously undruggable binding sites, with recent advancements in deep generative models enabling full-atom peptide co-design for specific protein receptors. However, the critical role of molecular surfaces in protein-protein interactions (PPIs) has been underexplored. To bridge this gap, we propose an omni-design peptides generation paradigm, called SurfFlow, a novel surface-based generative algorithm that enables comprehensive co-design of sequence, structure, and surface for peptides. SurfFlow employs a multi-modality conditional flow matching (CFM) architecture to learn distributions of surface geometries and biochemical properties, enhancing peptide binding accuracy. Evaluated on the comprehensive PepMerge benchmark, SurfFlow consistently outperforms full-atom baselines across all metrics. These results highlight the advantages of considering molecular surfaces in de novo peptide discovery and demonstrate the potential of integrating multiple protein modalities for more effective therapeutic peptide discovery.
LGNov 12, 2025
DeepDR: an integrated deep-learning model web server for drug repositioningShuting Jin, Yi Jiang, Yimin Liu et al.
Background: Identifying new indications for approved drugs is a complex and time-consuming process that requires extensive knowledge of pharmacology, clinical data, and advanced computational methods. Recently, deep learning (DL) methods have shown their capability for the accurate prediction of drug repositioning. However, implementing DL-based modeling requires in-depth domain knowledge and proficient programming skills. Results: In this application, we introduce DeepDR, the first integrated platform that combines a variety of established DL-based models for disease- and target-specific drug repositioning tasks. DeepDR leverages invaluable experience to recommend candidate drugs, which covers more than 15 networks and a comprehensive knowledge graph that includes 5.9 million edges across 107 types of relationships connecting drugs, diseases, proteins/genes, pathways, and expression from six existing databases and a large scientific corpus of 24 million PubMed publications. Additionally, the recommended results include detailed descriptions of the recommended drugs and visualize key patterns with interpretability through a knowledge graph. Conclusion: DeepDR is free and open to all users without the requirement of registration. We believe it can provide an easy-to-use, systematic, highly accurate, and computationally automated platform for both experimental and computational scientists.
LGJul 20, 2021Code
Heterogeneous network-based drug repurposing for COVID-19Shuting Jin, Xiangxiang Zeng, Wei Huang et al.
The Corona Virus Disease 2019 (COVID-19) belongs to human coronaviruses (HCoVs), which spreads rapidly around the world. Compared with new drug development, drug repurposing may be the best shortcut for treating COVID-19. Therefore, we constructed a comprehensive heterogeneous network based on the HCoVs-related target proteins and use the previously proposed deepDTnet, to discover potential drug candidates for COVID-19. We obtain high performance in predicting the possible drugs effective for COVID-19 related proteins. In summary, this work utilizes a powerful heterogeneous network-based deep learning method, which may be beneficial to quickly identify candidate repurposable drugs toward future clinical trials for COVID-19. The code and data are available at https://github.com/stjin-XMU/HnDR-COVID.
LGFeb 13, 2022
Improving Molecular Representation Learning with Metric Learning-enhanced Optimal TransportFang Wu, Nicolas Courty, Shuting Jin et al.
Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances with desired properties.