Empirical Evaluation of A New Approach to Simplifying Long Short-term Memory (LSTM)
This work addresses the issue of computational complexity in LSTMs for researchers and practitioners in sequence modeling, but it is incremental as it builds on existing LSTM modifications.
The paper tackled the problem of LSTM's complex structure by empirically comparing the standard LSTM with three simplified variants that reduce parameters by eliminating certain gate signals. The result showed that these variants achieved comparable performance on sequence modeling tasks, with attention needed to tuning the learning rate for high accuracies.
The standard LSTM, although it succeeds in the modeling long-range dependences, suffers from a highly complex structure that can be simplified through modifications to its gate units. This paper was to perform an empirical comparison between the standard LSTM and three new simplified variants that were obtained by eliminating input signal, bias and hidden unit signal from individual gates, on the tasks of modeling two sequence datasets. The experiments show that the three variants, with reduced parameters, can achieve comparable performance with the standard LSTM. Due attention should be paid to turning the learning rate to achieve high accuracies