AUTOCT: Automating Interpretable Clinical Trial Prediction with LLM Agents
This addresses the need for interpretable and efficient clinical trial prediction in biomedical research, offering a scalable solution that could reduce time and expenses in drug development.
The paper tackles the problem of predicting clinical trial outcomes to reduce costs and accelerate drug discovery, proposing AutoCT, a framework that combines large language models with classical machine learning for interpretability, achieving performance on par with or better than state-of-the-art methods.
Clinical trials are critical for advancing medical treatments but remain prohibitively expensive and time-consuming. Accurate prediction of clinical trial outcomes can significantly reduce research and development costs and accelerate drug discovery. While recent deep learning models have shown promise by leveraging unstructured data, their black-box nature, lack of interpretability, and vulnerability to label leakage limit their practical use in high-stakes biomedical contexts. In this work, we propose AutoCT, a novel framework that combines the reasoning capabilities of large language models with the explainability of classical machine learning. AutoCT autonomously generates, evaluates, and refines tabular features based on public information without human input. Our method uses Monte Carlo Tree Search to iteratively optimize predictive performance. Experimental results show that AutoCT performs on par with or better than SOTA methods on clinical trial prediction tasks within only a limited number of self-refinement iterations, establishing a new paradigm for scalable, interpretable, and cost-efficient clinical trial prediction.