LG AI DBMar 30

A Latent Risk-Aware Machine Learning Approach for Predicting Operational Success in Clinical Trials based on TrialsBank

Iness Halimi, Emmanuel Piffo, Oumnia Boudersa, Yvan Marcel Carre Vilmorin, Melissa Ait-ikhlef, Karima Kone, Andy Tan, Augustin Medina, Juliette Hernando, Sheila Ernest, Vatche Bartekian, Karine Lalonde

arXiv:2603.2904134.3h-index: 7

Predicted impact top 69% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For pharmaceutical companies and clinical researchers, this framework enables early risk assessment of trial operational success using pre-trial features, supporting data-driven decision-making to reduce costs and timelines.

The paper presents a hierarchical latent risk-aware machine learning framework that predicts clinical trial operational success using 13,700 trials from TrialsBank, achieving F1-scores of 0.93, 0.92, and 0.91 for Phase I-III, demonstrating that prospective forecasting of operational success is feasible.

Clinical trials are characterized by high costs, extended timelines, and substantial operational risk, yet reliable prospective methods for predicting trial success before initiation remain limited. Existing artificial intelligence approaches often focus on isolated metrics or specific development stages and frequently rely on variables unavailable at the trial design phase, limiting real-world applicability. We present a hierarchical latent risk-aware machine learning framework for prospective prediction of clinical trial operational success using a curated subset of TrialsBank, a proprietary AI-ready database developed by Sorintellis, comprising 13,700 trials. Operational success was defined as the ability to initiate, conduct, and complete a clinical trial according to planned timelines, recruitment targets, and protocol specifications through database lock. This approach decomposes operational success prediction into two modeling stages. First, intermediate latent operational risk factors are predicted using more than 180 drug- and trial-level features available before trial initiation. These predicted latent risks are then integrated into a downstream model to estimate the probability of operational success. A staged data-splitting strategy was employed to prevent information leakage, and models were benchmarked using XGBoost, CatBoost, and Explainable Boosting Machines. Across Phase I-III, the framework achieves strong out-of-sample performance, with F1-scores of 0.93, 0.92, and 0.91, respectively. Incorporating latent risk drivers improves discrimination of operational failures, and performance remains robust under independent inference evaluation. These results demonstrate that clinical trial operational success can be prospectively forecasted using a latent risk-aware AI framework, enabling early risk assessment and supporting data-driven clinical development decision-making.

View on arXiv PDF

Similar