BM LG MLNov 23, 2014

Target Fishing: A Single-Label or Multi-Label Problem?

Avid M. Afzal, Hamse Y. Mussa, Richard E. Turner, Andreas Bender, Robert C. Glen

arXiv:1411.6285v11 citations

Originality Incremental advance

AI Analysis

This addresses the issue of inaccurate drug target predictions due to the single-target assumption, which is incremental by applying multi-label classification to an existing cheminformatics task.

The study tackled the problem of ligand promiscuity in drug development by comparing single-label and multi-label machine learning approaches for target-fishing, finding that the multi-label Naive Bayes model achieved a recall of 0.8058 and precision of 0.6622, with statistical significance favoring it over the single-label model.

According to Cobanoglu et al and Murphy, it is now widely acknowledged that the single target paradigm (one protein or target, one disease, one drug) that has been the dominant premise in drug development in the recent past is untenable. More often than not, a drug-like compound (ligand) can be promiscuous - that is, it can interact with more than one target protein. In recent years, in in silico target prediction methods the promiscuity issue has been approached computationally in different ways. In this study we confine attention to the so-called ligand-based target prediction machine learning approaches, commonly referred to as target-fishing. With a few exceptions, the target-fishing approaches that are currently ubiquitous in cheminformatics literature can be essentially viewed as single-label multi-classification schemes; these approaches inherently bank on the single target paradigm assumption that a ligand can home in on one specific target. In order to address the ligand promiscuity issue, one might be able to cast target-fishing as a multi-label multi-class classification problem. For illustrative and comparison purposes, single-label and multi-label Naive Bayes classification models (denoted here by SMM and MMM, respectively) for target-fishing were implemented. The models were constructed and tested on 65,587 compounds and 308 targets retrieved from the ChEMBL17 database. SMM and MMM performed differently: for 16,344 test compounds, the MMM model returned recall and precision values of 0.8058 and 0.6622, respectively; the corresponding recall and precision values yielded by the SMM model were 0.7805 and 0.7596, respectively. However, at a significance level of 0.05 and one degree of freedom McNemar test performed on the target prediction results returned by SMM and MMM for the 16,344 test ligands gave a chi-squared value of 15.656, in favour of the MMM approach.

View on arXiv PDF

Similar