Protein (Multi-)Location Prediction: Using Location Inter-Dependencies in a Probabilistic Framework
This addresses the challenge of accurately predicting protein multi-localization for biologists and drug developers, representing an incremental improvement over existing methods.
The paper tackles the problem of predicting multiple subcellular locations for proteins, which is important for understanding protein function, by developing a method that incorporates location inter-dependencies using Bayesian network classifiers. The results show significantly higher performance compared to classifiers without inter-dependencies and are comparable to a top system (YLoc+) on multi-localized proteins without training set restrictions.
Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins, assuming that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems have attempted to predict multiple locations of proteins, they typically treat locations as independent or capture inter-dependencies by treating each locations-combination present in the training set as an individual location-class. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the multiple-location-prediction process, using a collection of Bayesian network classifiers. We evaluate our system on a dataset of single- and multi-localized proteins. Our results, obtained by incorporating inter-dependencies are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without restricting predictions to be based only on location-combinations present in the training set.