CandidateDrug4Cancer: An Open Molecular Graph Learning Benchmark on Drug Discovery for Cancer
This provides a standardized dataset for researchers in computational drug discovery to benchmark graph machine learning methods, though it is incremental as it builds on existing data and baselines.
The authors tackled the problem of anti-cancer drug discovery by creating CandidateDrug4Cancer, an open benchmark dataset with 54,869 cancer-related drug molecules across 29 targets, and found it presents significant challenges for graph learning methods in drug-target interaction prediction.
Anti-cancer drug discoveries have been serendipitous, we sought to present the Open Molecular Graph Learning Benchmark, named CandidateDrug4Cancer, a challenging and realistic benchmark dataset to facilitate scalable, robust, and reproducible graph machine learning research for anti-cancer drug discovery. CandidateDrug4Cancer dataset encompasses multiple most-mentioned 29 targets for cancer, covering 54869 cancer-related drug molecules which are ranged from pre-clinical, clinical and FDA-approved. Besides building the datasets, we also perform benchmark experiments with effective Drug Target Interaction (DTI) prediction baselines using descriptors and expressive graph neural networks. Experimental results suggest that CandidateDrug4Cancer presents significant challenges for learning molecular graphs and targets in practical application, indicating opportunities for future researches on developing candidate drugs for treating cancers.