LGCLMLJun 8, 2018

Towards Binary-Valued Gates for Robust LSTM Training

arXiv:1806.02988v150 citations
Originality Incremental advance
AI Analysis

This addresses a robustness and interpretability problem for users of LSTMs in sequence modeling, though it is incremental as it builds on existing LSTM structures.

The paper tackles the issue of soft gates in LSTMs not fully controlling information flow by proposing a training method that pushes gate outputs towards binary values (0 or 1), resulting in more interpretable gates and no performance drop, with compressed models even outperforming baselines.

Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although its practical implementation based on soft gates only partially achieves this goal. In this paper, we propose a new way for LSTM training, which pushes the output values of the gates towards 0 or 1. By doing so, we can better control the information flow: the gates are mostly open or closed, instead of in a middle state, which makes the results more interpretable. Empirical studies show that (1) Although it seems that we restrict the model capacity, there is no performance drop: we achieve better or comparable performances due to its better generalization ability; (2) The outputs of gates are not sensitive to their inputs: we can easily compress the LSTM unit in multiple ways, e.g., low-rank approximation and low-precision approximation. The compressed models are even better than the baseline models without compression.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes