EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation
This work addresses enzyme function prediction for bioinformatics, offering a method that leverages protein structure over sequence, but it is incremental as it applies existing 3D CNN techniques to a specific domain.
The authors tackled enzyme classification by developing EnzyNet, a 3D convolutional neural network that predicts Enzyme Commission numbers using voxel-based spatial protein structures, achieving 78.4% accuracy on a dataset of 63,558 enzymes.
During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank has increased more than 15 fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence however is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D-convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The 2-layer architecture was investigated on a large dataset of 63,558 enzymes from the Protein Data Bank and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.