Hierarchical Protein Function Prediction with Tail-GNNs
This work addresses protein function prediction for bioinformatics, offering a novel method to exploit label hierarchies, though it is incremental in applying GNNs to label spaces.
The paper tackles protein function prediction by framing it as predicting subgraphs of a hierarchical graph, introducing Tail-GNNs to combine with neural networks for relational label reinforcement, resulting in significant F1 score improvements.
Protein function prediction may be framed as predicting subgraphs (with certain closure properties) of a directed acyclic graph describing the hierarchy of protein functions. Graph neural networks (GNNs), with their built-in inductive bias for relational data, are hence naturally suited for this task. However, in contrast with most GNN applications, the graph is not related to the input, but to the label space. Accordingly, we propose Tail-GNNs, neural networks which naturally compose with the output space of any neural network for multi-task prediction, to provide relationally-reinforced labels. For protein function prediction, we combine a Tail-GNN with a dilated convolutional network which learns representations of the protein sequence, making significant improvement in F_1 score and demonstrating the ability of Tail-GNNs to learn useful representations of labels and exploit them in real-world problem solving.