LGDCCOMP-PHSep 21, 2023

Parallelizing non-linear sequential models over the sequence length

arXiv:2309.12252v339 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses a fundamental performance limitation for researchers and practitioners using sequential models in applications like long time series classification.

The paper tackles the slow training bottleneck of sequential models like RNNs and Neural ODEs by introducing a parallel algorithm that accelerates GPU evaluation by up to 3 orders of magnitude without compromising accuracy, making training over 10 times faster than sequential methods.

Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought sequential models could not be parallelized. We challenge this long-held belief with our parallel algorithm that accelerates GPU evaluation of sequential models by up to 3 orders of magnitude faster without compromising output accuracy. The algorithm does not need any special structure in the sequential models' architecture, making it applicable to a wide range of architectures. Using our method, training sequential models can be more than 10 times faster than the common sequential method without any meaningful difference in the training results. Leveraging this accelerated training, we discovered the efficacy of the Gated Recurrent Unit in a long time series classification problem with 17k time samples. By overcoming the training bottleneck, our work serves as the first step to unlock the potential of non-linear sequential models for long sequence problems.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes