BMLGOct 16, 2020

SidechainNet: An All-Atom Protein Structure Dataset for Machine Learning

arXiv:2010.08162v225 citationsHas Code
AI Analysis

This provides a new dataset for researchers in computational biology and machine learning to improve protein structure prediction, though it is incremental as it builds on existing ProteinNet data.

The authors tackled the lack of datasets for predicting both backbone and sidechain structures in proteins by creating SidechainNet, an all-atom protein structure dataset that extends ProteinNet, resulting in a publicly available resource with angle and atomic coordinate data for all heavy atoms.

Despite recent advancements in deep learning methods for protein structure prediction and representation, little focus has been directed at the simultaneous inclusion and prediction of protein backbone and sidechain structure information. We present SidechainNet, a new dataset that directly extends the ProteinNet dataset. SidechainNet includes angle and atomic coordinate information capable of describing all heavy atoms of each protein structure. In this paper, we provide background information on the availability of protein structure data and the significance of ProteinNet. Thereafter, we argue for the potentially beneficial inclusion of sidechain information through SidechainNet, describe the process by which we organize SidechainNet, and provide a software package (https://github.com/jonathanking/sidechainnet) for data manipulation and training with machine learning models.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes