Memory Matching Networks for Genomic Sequence Classification
This work addresses the challenge of automating motif-based genomic sequence classification for researchers in bioinformatics, representing an incremental advancement by adapting memory models to this domain.
The paper tackles the problem of classifying DNA sequences as protein binding sites by introducing memory matching networks (MMN), which learn a memory bank of encoded motifs and match test sequences to these motifs, achieving improved classification accuracy over baseline methods.
When analyzing the genome, researchers have discovered that proteins bind to DNA based on certain patterns of the DNA sequence known as "motifs". However, it is difficult to manually construct motifs due to their complexity. Recently, externally learned memory models have proven to be effective methods for reasoning over inputs and supporting sets. In this work, we present memory matching networks (MMN) for classifying DNA sequences as protein binding sites. Our model learns a memory bank of encoded motifs, which are dynamic memory modules, and then matches a new test sequence to each of the motifs to classify the sequence as a binding or nonbinding site.