LG GN QMApr 30, 2020

A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning

Austin Clyde, Tom Brettin, Alexander Partin, Maulik Shaulik, Hyunseung Yoo, Yvonne Evrard, Yitan Zhu, Fangfang Xia, Rick Stevens

arXiv:2005.00095v25.87 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of improving drug response predictions for cancer treatment, but it is incremental as it focuses on data integration and featurization rather than novel methods.

The paper tackled the problem of predicting cancer drug sensitivity by systematically evaluating featurization techniques using deep learning on large-scale cancer cell line data, finding that RNA-seq features are highly informative and SNP count matrices significantly improve performance, with over 35,000 neural network models trained.

By combining various cancer cell line (CCL) drug screening panels, the size of the data has grown significantly to begin understanding how advances in deep learning can advance drug response predictions. In this paper we train >35,000 neural network models, sweeping over common featurization techniques. We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features. We found the inclusion of single nucleotide polymorphisms (SNPs) coded as count matrices improved model performance significantly, and no substantial difference in model performance with respect to molecular featurization between the common open source MOrdred descriptors and Dragon7 descriptors. Alongside this analysis, we outline data integration between CCL screening datasets and present evidence that new metrics and imbalanced data techniques, as well as advances in data standardization, need to be developed.

View on arXiv PDF Code

Similar