Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language
This work addresses the need for more flexible and efficient models in drug discovery, though it appears incremental as it builds on existing language model capabilities with a novel architecture.
The authors tackled the problem of activity prediction models in drug discovery requiring training for new tasks by proposing a model that adapts at inference time using textual descriptions, resulting in improved performance on few-shot and zero-shot benchmarks.
Activity and property prediction models are the central workhorses in drug discovery and materials sciences, but currently they have to be trained or fine-tuned for new tasks. Without training or fine-tuning, scientific language models could be used for such low-data tasks through their announced zero- and few-shot capabilities. However, their predictive quality at activity prediction is lacking. In this work, we envision a novel type of activity prediction model that is able to adapt to new prediction tasks at inference time, via understanding textual information describing the task. To this end, we propose a new architecture with separate modules for chemical and natural language inputs, and a contrastive pre-training objective on data from large biochemical databases. In extensive experiments, we show that our method CLAMP yields improved predictive performance on few-shot learning benchmarks and zero-shot problems in drug discovery. We attribute the advances of our method to the modularized architecture and to our pre-training objective.