SMILES-Mamba: Chemical Mamba Foundation Models for Drug ADMET Prediction
This addresses the challenge of accurate ADMET prediction for drug discovery, reducing dependence on large labeled datasets, but it is incremental as it applies existing self-supervised learning methods to this domain.
The paper tackled the problem of predicting ADMET properties for small-molecule drugs in drug discovery, which is resource-intensive, by proposing SMILES-Mamba, a two-stage model using self-supervised pretraining and fine-tuning; it achieved competitive performance across 22 ADMET datasets, with the highest score in 14 tasks.
In drug discovery, predicting the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of small-molecule drugs is critical for ensuring safety and efficacy. However, the process of accurately predicting these properties is often resource-intensive and requires extensive experimental data. To address this challenge, we propose SMILES-Mamba, a two-stage model that leverages both unlabeled and labeled data through a combination of self-supervised pretraining and fine-tuning strategies. The model first pre-trains on a large corpus of unlabeled SMILES strings to capture the underlying chemical structure and relationships, before being fine-tuned on smaller, labeled datasets specific to ADMET tasks. Our results demonstrate that SMILES-Mamba exhibits competitive performance across 22 ADMET datasets, achieving the highest score in 14 tasks, highlighting the potential of self-supervised learning in improving molecular property prediction. This approach not only enhances prediction accuracy but also reduces the dependence on large, labeled datasets, offering a promising direction for future research in drug discovery.