LGJul 10, 2024
A deep graph model for the signed interaction prediction in biological networkShuyi Jin, Mengji Zhang, Meijie Wang et al.
Predicting signed interactions in biological networks is crucial for understanding drug mechanisms and facilitating drug repurposing. While deep graph models have demonstrated success in modeling complex biological systems, existing approaches often fail to distinguish between positive and negative interactions, limiting their utility for precise pharmacological predictions. In this study, we propose a novel deep graph model, \textbf{RGCNTD} (Relational Graph Convolutional Network with Tensor Decomposition), designed to predict both polar (e.g., activation, inhibition) and non-polar (e.g., binding, affect) chemical-gene interactions. Our model integrates graph convolutional networks with tensor decomposition to enhance feature representation and incorporates a conflict-aware sampling strategy to resolve polarity ambiguities. We introduce new evaluation metrics, \textit{AUC\textsubscript{polarity}} and \textit{CP@500}, to assess the model's ability to differentiate interaction types. Experimental results demonstrate that \textbf{RGCNTD} outperforms baseline models, achieving superior classification accuracy and improved discrimination of polar edges. Furthermore, we analyze the impact of subgraph components on predictive performance, revealing that additional network structures do not always enhance accuracy. These findings highlight the importance of polarity-aware modeling in drug discovery and network pharmacology, providing a robust framework for predicting complex biological interactions.
LGJan 23
Safe Multitask Molecular Graph Networks for Vapor Pressure and Odor Threshold PredictionShuang Wu, Meijie Wang, Lun Yu
We investigate two important tasks in odor-related property modeling: Vapor Pressure (VP) and Odor Threshold (OP). To evaluate the model's out-of-distribution (OOD) capability, we adopt the Bemis-Murcko scaffold split. In terms of features, we introduce the rich A20/E17 molecular graph features (20-dimensional atom features + 17-dimensional bond features) and systematically compare GINE and PNA backbones. The results show: for VP, PNA with a simple regression head achieves Val MSE $\approx$ 0.21 (normalized space); for the OP single task under the same scaffold split, using A20/E17 with robust training (Huber/winsor) achieves Val MSE $\approx$ 0.60-0.61. For multitask training, we propose a **"safe multitask"** approach: VP as the primary task and OP as the auxiliary task, using delayed activation + gradient clipping + small weight, which avoids harming the primary task and simultaneously yields the best VP generalization performance. This paper provides complete reproducible experiments, ablation studies, and error-similarity analysis while discussing the impact of data noise and method limitations.
LGNov 26, 2024
Can artificial intelligence predict clinical trial outcomes?Shuyi Jin, Lu Chen, Hongru Ding et al.
This study evaluates the performance of large language models (LLMs) and the HINT model in predicting clinical trial outcomes, focusing on metrics including Balanced Accuracy, Matthews Correlation Coefficient (MCC), Recall, and Specificity. Results show that GPT-4o achieves superior overall performance among LLMs but, like its counterparts (GPT-3.5, GPT-4mini, Llama3), struggles with identifying negative outcomes. In contrast, HINT excels in negative sample recognition and demonstrates resilience to external factors (e.g., recruitment challenges) but underperforms in oncology trials, a major component of the dataset. LLMs exhibit strengths in early-phase trials and simpler endpoints like Overall Survival (OS), while HINT shows consistency across trial phases and excels in complex endpoints (e.g., Objective Response Rate). Trial duration analysis reveals improved model performance for medium- to long-term trials, with GPT-4o and HINT displaying stability and enhanced specificity, respectively. We underscore the complementary potential of LLMs (e.g., GPT-4o, Llama3) and HINT, advocating for hybrid approaches to leverage GPT-4o's predictive power and HINT's specificity in clinical trial outcome forecasting.
LGMay 22, 2025
LengthLogD: A Length-Stratified Ensemble Framework for Enhanced Peptide Lipophilicity Prediction via Multi-Scale Feature IntegrationShuang Wu, Meijie Wang, Lun Yu
Peptide compounds demonstrate considerable potential as therapeutic agents due to their high target affinity and low toxicity, yet their drug development is constrained by their low membrane permeability. Molecular weight and peptide length have significant effects on the logD of peptides, which in turn influences their ability to cross biological membranes. However, accurate prediction of peptide logD remains challenging due to the complex interplay between sequence, structure, and ionization states. This study introduces LengthLogD, a predictive framework that establishes specialized models through molecular length stratification while innovatively integrating multi-scale molecular representations. We constructed feature spaces across three hierarchical levels: atomic (10 molecular descriptors), structural (1024-bit Morgan fingerprints), and topological (3 graph-based features including Wiener index), optimized through stratified ensemble learning. An adaptive weight allocation mechanism specifically developed for long peptides significantly enhances model generalizability. Experimental results demonstrate superior performance across all categories: short peptides (R^2=0.855), medium peptides (R^2=0.816), and long peptides (R^2=0.882), with a 34.7% reduction in prediction error for long peptides compared to conventional single-model approaches. Ablation studies confirm: 1) The length-stratified strategy contributes 41.2% to performance improvement; 2) Topological features account for 28.5% of predictive importance. Compared to state-of-the-art models, our method maintains short peptide prediction accuracy while achieving a 25.7% increase in the coefficient of determination (R^2) for long peptides. This research provides a precise logD prediction tool for peptide drug development, particularly demonstrating unique value in optimizing long peptide lead compounds.
LGMay 9, 2025
BMDetect: A Multimodal Deep Learning Framework for Comprehensive Biomedical Misconduct DetectionYize Zhou, Jie Zhang, Meijie Wang et al.
Academic misconduct detection in biomedical research remains challenging due to algorithmic narrowness in existing methods and fragmented analytical pipelines. We present BMDetect, a multimodal deep learning framework that integrates journal metadata (SJR, institutional data), semantic embeddings (PubMedBERT), and GPT-4o-mined textual attributes (methodological statistics, data anomalies) for holistic manuscript evaluation. Key innovations include: (1) multimodal fusion of domain-specific features to reduce detection bias; (2) quantitative evaluation of feature importance, identifying journal authority metrics (e.g., SJR-index) and textual anomalies (e.g., statistical outliers) as dominant predictors; and (3) the BioMCD dataset, a large-scale benchmark with 13,160 retracted articles and 53,411 controls. BMDetect achieves 74.33% AUC, outperforming single-modality baselines by 8.6%, and demonstrates transferability across biomedical subfields. This work advances scalable, interpretable tools for safeguarding research integrity.