QMCELGOCBMMLAug 8, 2013

Predicting protein contact map using evolutionary and physical constraints by integer programming (extended version)

arXiv:1308.1975v2143 citations
Originality Highly original
AI Analysis

This work addresses a key problem in computational biology for protein structure prediction, offering a more accurate and faster method that reduces reliance on large sequence homolog datasets.

The paper tackles the challenge of predicting protein contact maps from sequence information by integrating evolutionary and physical constraints through machine learning and integer linear programming, resulting in PhyCMAP, which outperforms existing methods in accuracy and speed, with predictions completed within minutes after PSIBLAST search.

Motivation. Protein contact map describes the pairwise spatial and functional relationship of residues in a protein and contains key information for protein 3D structure prediction. Although studied extensively, it remains very challenging to predict contact map using only sequence information. Most existing methods predict the contact map matrix element-by-element, ignoring correlation among contacts and physical feasibility of the whole contact map. A couple of recent methods predict contact map based upon residue co-evolution, taking into consideration contact correlation and enforcing a sparsity restraint, but these methods require a very large number of sequence homologs for the protein under consideration and the resultant contact map may be still physically unfavorable. Results. This paper presents a novel method PhyCMAP for contact map prediction, integrating both evolutionary and physical restraints by machine learning and integer linear programming (ILP). The evolutionary restraints include sequence profile, residue co-evolution and context-specific statistical potential. The physical restraints specify more concrete relationship among contacts than the sparsity restraint. As such, our method greatly reduces the solution space of the contact map matrix and thus, significantly improves prediction accuracy. Experimental results confirm that PhyCMAP outperforms currently popular methods no matter how many sequence homologs are available for the protein under consideration. PhyCMAP can predict contacts within minutes after PSIBLAST search for sequence homologs is done, much faster than the two recent methods PSICOV and EvFold. See http://raptorx.uchicago.edu for the web server.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes