QMCELGMar 6, 2014

Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction

arXiv:1403.1347v1157 citations
Originality Incremental advance
AI Analysis

This work addresses a fundamental problem in bioinformatics for protein structure prediction, offering an incremental improvement in accuracy.

The paper tackles protein secondary structure prediction by introducing a supervised generative stochastic network with a convolutional architecture, achieving 66.4% Q8 accuracy on the CB513 dataset, which improves upon the previous best of 64.9%.

Predicting protein secondary structure is a fundamental problem in protein structure prediction. Here we present a new supervised generative stochastic network (GSN) based method to predict local secondary structure with deep hierarchical representations. GSN is a recently proposed deep learning technique (Bengio & Thibodeau-Laufer, 2013) to globally train deep generative model. We present the supervised extension of GSN, which learns a Markov chain to sample from a conditional distribution, and applied it to protein structure prediction. To scale the model to full-sized, high-dimensional data, like protein sequences with hundreds of amino acids, we introduce a convolutional architecture, which allows efficient learning across multiple layers of hierarchical representations. Our architecture uniquely focuses on predicting structured low-level labels informed with both low and high-level representations learned by the model. In our application this corresponds to labeling the secondary structure state of each amino-acid residue. We trained and tested the model on separate sets of non-homologous proteins sharing less than 30% sequence identity. Our model achieves 66.4% Q8 accuracy on the CB513 dataset, better than the previously reported best performance 64.9% (Wang et al., 2011) for this challenging secondary structure prediction problem.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes