GNLGJun 2, 2021

DNA-GCN: Graph convolutional networks for predicting DNA-protein binding

arXiv:2106.01836v19 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a classic bioinformatics problem for researchers in genomics, but it is incremental as it applies an existing graph-based method to a new domain without major breakthroughs.

The authors tackled the problem of predicting DNA-protein binding by proposing DNA-GCN, a graph convolutional network that models sequence data as a k-mer graph, achieving competitive performance on 50 ENCODE datasets compared to baseline methods.

Predicting DNA-protein binding is an important and classic problem in bioinformatics. Convolutional neural networks have outperformed conventional methods in modeling the sequence specificity of DNA-protein binding. However, none of the studies has utilized graph convolutional networks for motif inference. In this work, we propose to use graph convolutional networks for motif inference. We build a sequence k-mer graph for the whole dataset based on k-mer co-occurrence and k-mer sequence relationship and then learn DNA Graph Convolutional Network (DNA-GCN) for the whole dataset. Our DNA-GCN is initialized with a one-hot representation for all nodes, and it then jointly learns the embeddings for both k-mers and sequences, as supervised by the known labels of sequences. We evaluate our model on 50 datasets from ENCODE. DNA-GCN shows its competitive performance compared with the baseline model. Besides, we analyze our model and design several different architectures to help fit different datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes