CLApr 6, 2019

Parallelizable Stack Long Short-Term Memory

arXiv:1904.03409v11091 citations
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in training StackLSTM models for applications like parsing and neural machine translation, offering a practical speed-up for researchers and practitioners in natural language processing.

The paper tackles the problem of parallelizing Stack Long Short-Term Memory (StackLSTM) for GPU training by homogenizing computations based on state access patterns, resulting in near-linear scaling with batch size and significantly faster training compared to a C++ implementation.

Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes