LGCVNEAug 12, 2016

Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

arXiv:1608.03644v4135 citations
Originality Synthesis-oriented
AI Analysis

This provides tools for biologists to interpret DNN predictions in genomics, but it is incremental as it applies existing visualization methods to a specific domain.

The paper tackled the problem of understanding how deep neural networks identify meaningful DNA sequence signals for transcription factor binding site classification, by proposing the Deep Motif Dashboard toolkit with visualization strategies like saliency maps and temporal scores, and found that a convolutional-recurrent architecture performs best among tested models.

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes