CEDec 1, 2022
Re-evaluating sample efficiency in de novo molecule generationMorgan Thomas, Noel M. O'Boyle, Andreas Bender et al.
De novo molecule generation can suffer from data inefficiency; requiring large amounts of training data or many sampled data points to conduct objective optimization. The latter is a particular disadvantage when combining deep generative models with computationally expensive molecule scoring functions (a.k.a. oracles) commonly used in computer-aided drug design. Recent works have therefore focused on methods to improve sample efficiency in the context of de novo molecule drug design, or to benchmark it. In this work, we discuss and adapt a recent sample efficiency benchmark to better reflect realistic goals also with respect to the quality of chemistry generated, which must always be considered in the context of small-molecule drug design; we then re-evaluate all benchmarked generative models. We find that accounting for molecular weight and LogP with respect to the training data, and the diversity of chemistry proposed, re-orders the ranking of generative models. In addition, we benchmark a recently proposed method to improve sample efficiency (Augmented Hill-Climb) and found it ranked top when considering both the sample efficiency and chemistry of molecules generated. Continual improvements in sample efficiency and chemical desirability enable more routine integration of computationally expensive scoring functions on a more realistic timescale.
MLOct 23, 2023
Evaluating machine learning models in non-standard settings: An overview and new findingsRoman Hornung, Malte Nalenz, Lennart Schneider et al.
Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines for GE estimation in various such non-standard settings: clustered data, spatial data, unequal sampling probabilities, concept drift, and hierarchically structured outcomes. Our overview combines well-established methodologies with other existing methods that, to our knowledge, have not been frequently considered in these particular settings. A unifying principle among these techniques is that the test data used in each iteration of the resampling procedure should reflect the new observations to which the model will be applied, while the training data should be representative of the entire data set used to obtain the final model. Beyond providing an overview, we address literature gaps by conducting simulation studies. These studies assess the necessity of using GE-estimation methods tailored to the respective setting. Our findings corroborate the concern that standard resampling methods often yield biased GE estimates in non-standard settings, underscoring the importance of tailored GE estimation.
MLOct 17, 2022
Conditional Neural Processes for MoleculesMiguel Garcia-Ortegon, Andreas Bender, Sergio Bacallado
Neural processes (NPs) are models for transfer learning with properties reminiscent of Gaussian Processes (GPs). They are adept at modelling data consisting of few observations of many related functions on the same input space and are trained by minimizing a variational objective, which is computationally much less expensive than the Bayesian updating required by GPs. So far, most studies of NPs have focused on low-dimensional datasets which are not representative of realistic transfer learning tasks. Drug discovery is one application area that is characterized by datasets consisting of many chemical properties or functions which are sparsely observed, yet depend on shared features or representations of the molecular inputs. This paper applies the conditional neural process (CNP) to DOCKSTRING, a dataset of docking scores for benchmarking ML models. CNPs show competitive performance in few-shot learning tasks relative to supervised learning baselines common in chemoinformatics, as well as an alternative model for transfer learning based on pre-training and refining neural network regressors. We present a Bayesian optimization experiment which showcases the probabilistic nature of CNPs and discuss shortcomings of the model in uncertainty quantification.
MLMay 25, 2022
Factorized Structured Regression for Large-Scale Varying Coefficient ModelsDavid Rügamer, Andreas Bender, Simon Wiegrebe et al.
Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects and potentially non-Gaussian outcomes. Such structured regression models, including time-aware varying coefficients models, are, however, limited in their applicability to categorical effects and inclusion of a large number of interactions. Here, we propose Factorized Structured Regression (FaStR) for scalable varying coefficient models. FaStR overcomes limitations of general regression models for large-scale data by combining structured additive regression and factorization approaches in a neural network-based model implementation. This fusion provides a scalable framework for the estimation of statistical models in previously infeasible data settings. Empirical results confirm that the estimation of varying coefficients of our approach is on par with state-of-the-art regression techniques, while scaling notably better and also being competitive with other time-aware RS in terms of prediction performance. We illustrate FaStR's performance and interpretability on a large-scale behavioral study with smartphone user data.
STDec 10, 2022
Examining marginal properness in the external validation of survival models with squared and logarithmic lossesRaphael Sonabend, John Zobolas, Riccardo Be Bin et al.
Scoring rules promote rational and honest decision-making, which is important for model evaluation and becoming increasingly important for automated procedures such as `AutoML'. In this paper we survey common squared and logarithmic scoring rules for survival analysis, with a focus on their theoretical and empirical properness. We introduce a marginal definition of properness and show that both the Integrated Survival Brier Score (ISBS) and the Right-Censored Log-Likelihood (RCLL) are theoretically improper under this definition. We also investigate a new class of losses that may inform future survival scoring rules. Simulation experiments reveal that both the ISBS and RCLL behave as proper scoring rules in practice. The RCLL showed no violations across all settings, while ISBS exhibited only minor, negligible violations at extremely small sample sizes, suggesting one can trust results from historical experiments. As such we advocate for both the RCLL and ISBS in external validation of models, including in automated procedures. However, we note practical challenges in estimating these losses including estimation of censoring distributions and densities; as such further research is required to advance development of robust and honest evaluation in survival analysis.
CVFeb 12
Semantically Conditioned Diffusion Models for Cerebral DSA SynthesisQiwen Xu, David Rügamer, Holger Wenz et al.
Digital subtraction angiography (DSA) plays a central role in the diagnosis and treatment of cerebrovascular disease, yet its invasive nature and high acquisition cost severely limit large-scale data collection and public data sharing. Therefore, we developed a semantically conditioned latent diffusion model (LDM) that synthesizes arterial-phase cerebral DSA frames under explicit control of anatomical circulation (anterior vs.\ posterior) and canonical C-arm positions. We curated a large single-centre DSA dataset of 99,349 frames and trained a conditional LDM using text embeddings that encoded anatomy and acquisition geometry. To assess clinical realism, four medical experts, including two neuroradiologists, one neurosurgeon, and one internal medicine expert, systematically rated 400 synthetic DSA images using a 5-grade Likert scale for evaluating proximal large, medium, and small peripheral vessels. The generated images achieved image-wise overall Likert scores ranging from 3.1 to 3.3, with high inter-rater reliability (ICC(2,k) = 0.80--0.87). Distributional similarity to real DSA frames was supported by a low median Fréchet inception distance (FID) of 15.27. Our results indicate that semantically controlled LDMs can produce realistic synthetic DSAs suitable for downstream algorithm development, research, and training.
LGOct 31, 2025
AI Agents in Drug DiscoverySrijit Seal, Dinh Long Huynh, Moudather Chelbi et al.
Artificial intelligence (AI) agents are emerging as transformative tools in drug discovery, with the ability to autonomously reason, act, and learn through complicated research workflows. Building on large language models (LLMs) coupled with perception, computation, action, and memory tools, these agentic AI systems could integrate diverse biomedical data, execute tasks, carry out experiments via robotic platforms, and iteratively refine hypotheses in closed loops. We provide a conceptual and technical overview of agentic AI architectures, ranging from ReAct and Reflection to Supervisor and Swarm systems, and illustrate their applications across key stages of drug discovery, including literature synthesis, toxicity prediction, automated protocol generation, small-molecule synthesis, drug repurposing, and end-to-end decision-making. To our knowledge, this represents the first comprehensive work to present real-world implementations and quantifiable impacts of agentic AI systems deployed in operational drug discovery settings. Early implementations demonstrate substantial gains in speed, reproducibility, and scalability, compressing workflows that once took months into hours while maintaining scientific traceability. We discuss the current challenges related to data heterogeneity, system reliability, privacy, and benchmarking, and outline future directions towards technology in support of science and translation.
MLJun 6, 2024Code
A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional DataLukas Burk, John Zobolas, Bernd Bischl et al.
This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are smaller in scale regarding the number of used datasets and extent of empirical evaluation. They often lack appropriate tuning or evaluation procedures, while other comparison studies focus on qualitative reviews rather than quantitative comparisons. This comprehensive study aims to fill the gap by neutrally evaluating a broad range of methods and providing generalizable guidelines for practitioners. We benchmark 19 models, ranging from classical statistical approaches to many common machine learning methods, on 34 publicly available datasets. The benchmark tunes models using both a discrimination measure (Harrell's C-index) and a scoring rule (Integrated Survival Brier Score), and evaluates them across six metrics covering discrimination, calibration, and overall predictive performance. Despite superior average ranks in overall predictive performance from individual learners like oblique random survival forests and likelihood-based boosting, and better discrimination rankings from multiple boosting- and tree-based methods as well as parametric survival models, no method significantly outperforms the commonly used Cox proportional hazards model for either tuning measure. We conclude that for predictive purposes in the standard survival analysis setting of low-dimensional, right-censored data, the Cox Proportional Hazards model remains a simple and robust method, sufficient for most practitioners. All code, data, and results are publicly available on GitHub https://github.com/slds-lmu/paper_2023_survival_benchmark
MLMay 24, 2023Code
Deep Learning for Survival Analysis: A ReviewSimon Wiegrebe, Philipp Kopper, Raphael Sonabend et al.
The influx of deep learning (DL) techniques into the field of survival analysis in recent years has led to substantial methodological progress; for instance, learning from unstructured or high-dimensional data such as images, text or omics data. In this work, we conduct a comprehensive systematic review of DL-based methods for time-to-event analysis, characterizing them according to both survival- and DL-related attributes. In summary, the reviewed methods often address only a small subset of tasks relevant to time-to-event data - e.g., single-risk right-censored data - and neglect to incorporate more complex settings. Our findings are summarized in an editable, open-source, interactive table: https://survival-org.github.io/DL4Survival. As this research area is advancing rapidly, we encourage community contribution in order to keep this database up to date.
MLDec 9, 2021Code
Avoiding C-hacking when evaluating survival distribution predictions with discrimination measuresRaphael Sonabend, Andreas Bender, Sebastian Vollmer
In this paper we consider how to evaluate survival distribution predictions with measures of discrimination. This is a non-trivial problem as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantages and disadvantages. Whilst distributions are frequently evaluated by discrimination measures, we find that the method for doing so is rarely described in the literature and often leads to unfair comparisons. We find that the most robust method of reducing a distribution to a risk is to sum over the predicted cumulative hazard. We recommend that machine learning survival analysis software implements clear transformations between distribution and risk predictions in order to allow more transparent and accessible model evaluation. The code used in the final experiment is available at https://github.com/RaphaelS1/distribution_discrimination.
MLOct 29, 2021Code
DOCKSTRING: easy molecular docking yields better benchmarks for ligand designMiguel García-Ortegón, Gregor N. C. Simm, Austin J. Tripp et al.
The field of machine learning for drug discovery is witnessing an explosion of novel methods. These methods are often benchmarked on simple physicochemical properties such as solubility or general druglikeness, which can be readily computed. However, these properties are poor representatives of objective functions in drug design, mainly because they do not depend on the candidate's interaction with the target. By contrast, molecular docking is a widely successful method in drug discovery to estimate binding affinities. However, docking simulations require a significant amount of domain knowledge to set up correctly which hampers adoption. To this end, we present DOCKSTRING, a bundle for meaningful and robust comparison of ML models consisting of three components: (1) an open-source Python package for straightforward computation of docking scores; (2) an extensive dataset of docking scores and poses of more than 260K ligands for 58 medically-relevant targets; and (3) a set of pharmaceutically-relevant benchmark tasks including regression, virtual screening, and de novo design. The Python package implements a robust ligand and target preparation protocol that allows non-experts to obtain meaningful docking scores. Our dataset is the first to include docking poses, as well as the first of its size that is a full matrix, thus facilitating experiments in multiobjective optimization and transfer learning. Overall, our results indicate that docking scores are a more appropriate evaluation objective than simple physicochemical properties, yielding more realistic benchmark tasks and molecular candidates.
QMAug 9, 2019Code
Concepts and Applications of Conformal Prediction in Computational Drug DiscoveryIsidro Cortés-Ciriano, Andreas Bender
Estimating the reliability of individual predictions is key to increase the adoption of computational models and artificial intelligence in preclinical drug discovery, as well as to foster its application to guide decision making in clinical settings. Among the large number of algorithms developed over the last decades to compute prediction errors, Conformal Prediction (CP) has gained increasing attention in the computational drug discovery community. A major reason for its recent popularity is the ease of interpretation of the computed prediction errors in both classification and regression tasks. For instance, at a confidence level of 90% the true value will be within the predicted confidence intervals in at least 90% of the cases. This so called validity of conformal predictors is guaranteed by the robust mathematical foundation underlying CP. The versatility of CP relies on its minimal computational footprint, as it can be easily coupled to any machine learning algorithm at little computational cost. In this review, we summarize underlying concepts and practical applications of CP with a particular focus on virtual screening and activity modelling, and list open source implementations of relevant software. Finally, we describe the current limitations in the field, and provide a perspective on future opportunities for CP in preclinical and clinical drug discovery.
AIMar 6, 2024
Understanding Biology in the Age of Artificial IntelligenceElsa Lawrence, Adham El-Shazly, Srijit Seal et al. · cambridge, harvard
Modern life sciences research is increasingly relying on artificial intelligence approaches to model biological systems, primarily centered around the use of machine learning (ML) models. Although ML is undeniably useful for identifying patterns in large, complex data sets, its widespread application in biological sciences represents a significant deviation from traditional methods of scientific inquiry. As such, the interplay between these models and scientific understanding in biology is a topic with important implications for the future of scientific research, yet it is a subject that has received little attention. Here, we draw from an epistemological toolkit to contextualize recent applications of ML in biological sciences under modern philosophical theories of understanding, identifying general principles that can guide the design and application of ML systems to model biological phenomena and advance scientific knowledge. We propose that conceptions of scientific understanding as information compression, qualitative intelligibility, and dependency relation modelling provide a useful framework for interpreting ML-mediated understanding of biological systems. Through a detailed analysis of two key application areas of ML in modern biological research - protein structure prediction and single cell RNA-sequencing - we explore how these features have thus far enabled ML systems to advance scientific understanding of their target phenomena, how they may guide the development of future ML models, and the key obstacles that remain in preventing ML from achieving its potential as a tool for biological discovery. Consideration of the epistemological features of ML applications in biology will improve the prospects of these methods to solve important problems and advance scientific understanding of living systems.
MLAug 7, 2025
Reduction Techniques for Survival AnalysisJohannes Piller, Léa Orsini, Simon Wiegrebe et al.
In this work, we discuss what we refer to as reduction techniques for survival analysis, that is, techniques that "reduce" a survival task to a more common regression or classification task, without ignoring the specifics of survival data. Such techniques particularly facilitate machine learning-based survival analysis, as they allow for applying standard tools from machine and deep learning to many survival tasks without requiring custom learners. We provide an overview of different reduction techniques and discuss their respective strengths and weaknesses. We also provide a principled implementation of some of these reductions, such that they are directly available within standard machine learning workflows. We illustrate each reduction using dedicated examples and perform a benchmark analysis that compares their predictive performance to established machine learning methods for survival analysis.
CLJun 11, 2025
Enhancing Traffic Accident Classifications: Application of NLP Methods for City SafetyEnes Özeren, Alexander Ulbrich, Sascha Filimon et al.
A comprehensive understanding of traffic accidents is essential for improving city safety and informing policy decisions. In this study, we analyze traffic incidents in Munich to identify patterns and characteristics that distinguish different types of accidents. The dataset consists of both structured tabular features, such as location, time, and weather conditions, as well as unstructured free-text descriptions detailing the circumstances of each accident. Each incident is categorized into one of seven predefined classes. To assess the reliability of these labels, we apply NLP methods, including topic modeling and few-shot learning, which reveal inconsistencies in the labeling process. These findings highlight potential ambiguities in accident classification and motivate a refined predictive approach. Building on these insights, we develop a classification model that achieves high accuracy in assigning accidents to their respective categories. Our results demonstrate that textual descriptions contain the most informative features for classification, while the inclusion of tabular data provides only marginal improvements. These findings emphasize the critical role of free-text data in accident analysis and highlight the potential of transformer-based models in improving classification reliability.
LGMar 19, 2024
On Training Survival Models with Scoring RulesPhilipp Kopper, David Rügamer, Raphael Sonabend et al.
Scoring rules are an established way of comparing predictive performances across model classes. In the context of survival analysis, they require adaptation in order to accommodate censoring. This work investigates using scoring rules for model training rather than evaluation. Doing so, we establish a general framework for training survival models that is model agnostic and can learn event time distributions parametrically or non-parametrically. In addition, our framework is not restricted to any specific scoring rule. While we focus on neural network-based implementations, we also provide proof-of-concept implementations using gradient boosting, generalized additive models, and trees. Empirical comparisons on synthetic and real-world data indicate that scoring rules can be successfully incorporated into model training and yield competitive predictive performance with established time-to-event models.
MLFeb 12, 2022
DeepPAMM: Deep Piecewise Exponential Additive Mixed Models for Complex Hazard Structures in Survival AnalysisPhilipp Kopper, Simon Wiegrebe, Bernd Bischl et al.
Survival analysis (SA) is an active field of research that is concerned with time-to-event outcomes and is prevalent in many domains, particularly biomedical applications. Despite its importance, SA remains challenging due to small-scale data sets and complex outcome distributions, concealed by truncation and censoring processes. The piecewise exponential additive mixed model (PAMM) is a model class addressing many of these challenges, yet PAMMs are not applicable in high-dimensional feature settings or in the case of unstructured or multimodal data. We unify existing approaches by proposing DeepPAMM, a versatile deep learning framework that is well-founded from a statistical point of view, yet with enough flexibility for modeling complex hazard structures. We illustrate that DeepPAMM is competitive with other machine learning approaches with respect to predictive performance while maintaining interpretability through benchmark experiments and an extended case study.
AIFeb 19, 2021
A Review of Biomedical Datasets Relating to Drug Discovery: A Knowledge Graph PerspectiveStephen Bonner, Ian P Barrett, Cheng Ye et al.
Drug discovery and development is a complex and costly process. Machine learning approaches are being investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Of these, those that use Knowledge Graphs (KG) have promise in many tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritisation. In a drug discovery KG, crucial elements including genes, diseases and drugs are represented as entities, whilst relationships between them indicate an interaction. However, to construct high-quality KGs, suitable data is required. In this review, we detail publicly available sources suitable for use in constructing drug discovery focused KGs. We aim to help guide machine learning and KG practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. The datasets are selected via strict criteria, categorised according to the primary type of information contained within and are considered based upon what information could be extracted to build a KG. We then present a comparative analysis of existing public drug discovery KGs and a evaluation of selected motivating case studies from the literature. Additionally, we raise numerous and unique challenges and issues associated with the domain and its datasets, whilst also highlighting key future research directions. We hope this review will motivate KGs use in solving key and emerging questions in the drug discovery domain.
LGNov 11, 2020
Semi-Structured Deep Piecewise Exponential ModelsPhilipp Kopper, Sebastian Pölsterl, Christian Wachinger et al.
We propose a versatile framework for survival analysis that combines advanced concepts from statistics with deep learning. The presented framework is based on piecewise exponential models and thereby supports various survival tasks, such as competing risks and multi-state modeling, and further allows for estimation of time-varying effects and time-varying features. To also include multiple data sources and higher-order interaction effects into the model, we embed the model class in a neural network and thereby enable the simultaneous estimation of both inherently interpretable structured regression inputs as well as deep neural network components which can potentially process additional unstructured data sources. A proof of concept is provided by using the framework to predict Alzheimer's disease progression based on tabular and 3D point cloud data and applying it to synthetic data.
COAug 18, 2020
mlr3proba: An R Package for Machine Learning in Survival AnalysisRaphael Sonabend, Franz J. Király, Andreas Bender et al.
As machine learning has become increasingly popular over the last few decades, so too has the number of machine learning interfaces for implementing these models. Whilst many R libraries exist for machine learning, very few offer extended support for survival analysis. This is problematic considering its importance in fields like medicine, bioinformatics, economics, engineering, and more. mlr3proba provides a comprehensive machine learning interface for survival analysis and connects with mlr3's general model tuning and benchmarking facilities to provide a systematic infrastructure for survival modeling and evaluation.
MLJun 27, 2020
A General Machine Learning Framework for Survival AnalysisAndreas Bender, David Rügamer, Fabian Scheipl et al.
The modeling of time-to-event data, also known as survival analysis, requires specialized methods that can deal with censoring and truncation, time-varying features and effects, and that extend to settings with multiple competing events. However, many machine learning methods for survival analysis only consider the standard setting with right-censored data and proportional hazards assumption. The methods that do provide extensions usually address at most a subset of these challenges and often require specialized software that can not be integrated into standard machine learning workflows directly. In this work, we present a very general machine learning framework for time-to-event analysis that uses a data augmentation strategy to reduce complex survival tasks to standard Poisson regression tasks. This reformulation is based on well developed statistical theory. With the proposed approach, any algorithm that can optimize a Poisson (log-)likelihood, such as gradient boosted trees, deep neural networks, model-based boosting and many more can be used in the context of time-to-event analysis. The proposed technique does not require any assumptions with respect to the distribution of event times or the functional shapes of feature and interaction effects. Based on the proposed framework we develop new methods that are competitive with specialized state of the art approaches in terms of accuracy, and versatility, but with comparatively small investments of programming effort or requirements for specialized methodological know-how.
LGApr 12, 2019
Reliable Prediction Errors for Deep Neural Networks Using Test-Time DropoutIsidro Cortes-Ciriano, Andreas Bender
While the use of deep learning in drug discovery is gaining increasing attention, the lack of methods to compute reliable errors in prediction for Neural Networks prevents their application to guide decision making in domains where identifying unreliable predictions is essential, e.g. precision medicine. Here, we present a framework to compute reliable errors in prediction for Neural Networks using Test-Time Dropout and Conformal Prediction. Specifically, the algorithm consists of training a single Neural Network using dropout, and then applying it N times to both the validation and test sets, also employing dropout in this step. Therefore, for each instance in the validation and test sets an ensemble of predictions were generated. The residuals and absolute errors in prediction for the validation set were then used to compute prediction errors for test set instances using Conformal Prediction. We show using 24 bioactivity data sets from ChEMBL 23 that dropout Conformal Predictors are valid (i.e., the fraction of instances whose true value lies within the predicted interval strongly correlates with the confidence level) and efficient, as the predicted confidence intervals span a narrower set of values than those computed with Conformal Predictors generated using Random Forest (RF) models. Lastly, we show in retrospective virtual screening experiments that dropout and RF-based Conformal Predictors lead to comparable retrieval rates of active compounds. Overall, we propose a computationally efficient framework (as only N extra forward passes are required in addition to training a single network) to harness Test-Time Dropout and the Conformal Prediction framework, and to thereby generate reliable prediction errors for deep Neural Networks.
CVNov 22, 2018
KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound imagesIsidro Cortes Ciriano, Andreas Bender
The application of convolutional neural networks (ConvNets) to harness high-content screening images or 2D compound representations is gaining increasing attention in drug discovery. However, existing applications often require large data sets for training, or sophisticated pretraining schemes. Here, we show using 33 IC50 data sets from ChEMBL 23 that the in vitro activity of compounds on cancer cell lines and protein targets can be accurately predicted on a continuous scale from their Kekule structure representations alone by extending existing architectures, which were pretrained on unrelated image data sets. We show that the predictive power of the generated models is comparable to that of Random Forest (RF) models and fully-connected Deep Neural Networks trained on circular (Morgan) fingerprints. Notably, including additional fully-connected layers further increases the predictive power of the ConvNets by up to 10%. Analysis of the predictions generated by RF models and ConvNets shows that by simply averaging the output of the RF models and ConvNets we obtain significantly lower errors in prediction for multiple data sets, although the effect size is small, than those obtained with either model alone, indicating that the features extracted by the convolutional layers of the ConvNets provide complementary predictive signal to Morgan fingerprints. Lastly, we show that multi-task ConvNets trained on compound images permit to model COX isoform selectivity on a continuous scale with errors in prediction comparable to the uncertainty of the data. Overall, in this work we present a set of ConvNet architectures for the prediction of compound activity from their Kekule structure representations with state-of-the-art performance, that require no generation of compound descriptors or use of sophisticated image processing techniques.
LGSep 24, 2018
Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural NetworksIsidro Cortes-Ciriano, Andreas Bender
Deep learning architectures have proved versatile in a number of drug discovery applications, including the modelling of in vitro compound activity. While controlling for prediction confidence is essential to increase the trust, interpretability and usefulness of virtual screening models in drug discovery, techniques to estimate the reliability of the predictions generated with deep learning networks remain largely underexplored. Here, we present Deep Confidence, a framework to compute valid and efficient confidence intervals for individual predictions using the deep learning technique Snapshot Ensembling and conformal prediction. Specifically, Deep Confidence generates an ensemble of deep neural networks by recording the network parameters throughout the local minima visited during the optimization phase of a single neural network. This approach serves to derive a set of base learners (i.e., snapshots) with comparable predictive power on average, that will however generate slightly different predictions for a given instance. The variability across base learners and the validation residuals are in turn harnessed to compute confidence intervals using the conformal prediction framework. Using a set of 24 diverse IC50 data sets from ChEMBL 23, we show that Snapshot Ensembles perform on par with Random Forest (RF) and ensembles of independently trained deep neural networks. In addition, we find that the confidence regions predicted using the Deep Confidence framework span a narrower set of values. Overall, Deep Confidence represents a highly versatile error prediction framework that can be applied to any deep learning-based application at no extra computational cost.
BMNov 23, 2014
Target Fishing: A Single-Label or Multi-Label Problem?Avid M. Afzal, Hamse Y. Mussa, Richard E. Turner et al.
According to Cobanoglu et al and Murphy, it is now widely acknowledged that the single target paradigm (one protein or target, one disease, one drug) that has been the dominant premise in drug development in the recent past is untenable. More often than not, a drug-like compound (ligand) can be promiscuous - that is, it can interact with more than one target protein. In recent years, in in silico target prediction methods the promiscuity issue has been approached computationally in different ways. In this study we confine attention to the so-called ligand-based target prediction machine learning approaches, commonly referred to as target-fishing. With a few exceptions, the target-fishing approaches that are currently ubiquitous in cheminformatics literature can be essentially viewed as single-label multi-classification schemes; these approaches inherently bank on the single target paradigm assumption that a ligand can home in on one specific target. In order to address the ligand promiscuity issue, one might be able to cast target-fishing as a multi-label multi-class classification problem. For illustrative and comparison purposes, single-label and multi-label Naive Bayes classification models (denoted here by SMM and MMM, respectively) for target-fishing were implemented. The models were constructed and tested on 65,587 compounds and 308 targets retrieved from the ChEMBL17 database. SMM and MMM performed differently: for 16,344 test compounds, the MMM model returned recall and precision values of 0.8058 and 0.6622, respectively; the corresponding recall and precision values yielded by the SMM model were 0.7805 and 0.7596, respectively. However, at a significance level of 0.05 and one degree of freedom McNemar test performed on the target prediction results returned by SMM and MMM for the 16,344 test ligands gave a chi-squared value of 15.656, in favour of the MMM approach.