LGDSMLOct 10, 2021

Long Expressive Memory for Sequence Modeling

arXiv:2110.04744v259 citations
AI Analysis

This addresses a fundamental challenge in sequence modeling for applications like speech and language processing, offering a novel solution with broad empirical gains.

The authors tackled the problem of learning long-term sequential dependencies by proposing Long Expressive Memory (LEM), a gradient-based method that mitigates exploding and vanishing gradients and outperforms state-of-the-art models like RNNs, GRUs, and LSTMs across tasks such as image classification, speech recognition, and language modeling.

We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable time-discretization of this system. For LEM, we derive rigorous bounds to show the mitigation of the exploding and vanishing gradients problem, a well-known challenge for gradient-based recurrent sequential learning methods. We also prove that LEM can approximate a large class of dynamical systems to high accuracy. Our empirical results, ranging from image and time-series classification through dynamical systems prediction to speech recognition and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes