Unsupervised and Supervised Structure Learning for Protein Contact Prediction
This work addresses protein structure and function understanding, with potential applications in contact-assisted folding, but appears incremental as it builds on existing methods.
The paper tackles protein contact prediction from sequences by establishing unsupervised graphical models with topology constraints and using supervised deep learning to boost accuracy, proposing a diversity score and algorithm for novelty measurement.
Protein contacts provide key information for the understanding of protein structure and function, and therefore contact prediction from sequences is an important problem. Recent research shows that some correctly predicted long-range contacts could help topology-level structure modeling. Thus, contact prediction and contact-assisted protein folding also proves the importance of this problem. In this thesis, I will briefly introduce the extant related work, then show how to establish the contact prediction through unsupervised graphical models with topology constraints. Further, I will explain how to use the supervised deep learning methods to further boost the accuracy of contact prediction. Finally, I will propose a scoring system called diversity score to measure the novelty of contact predictions, as well as an algorithm that predicts contacts with respect to the new scoring system.