LGCVDec 22, 2019

On the Initialization of Long Short-Term Memory Networks

arXiv:1912.10454v116 citations
Originality Incremental advance
AI Analysis

This addresses faster and more stable training for LSTM networks, particularly in time series and medical applications, but is incremental as it builds on existing initialization concepts.

The paper tackled training instability in LSTM networks by developing a robust weight initialization method based on normalized random initialization to preserve input and output variance, resulting in outperforming state-of-the-art techniques in convergence and generalization across univariate time series regression and multivariate disease modeling.

Weight initialization is important for faster convergence and stability of deep neural networks training. In this paper, a robust initialization method is developed to address the training instability in long short-term memory (LSTM) networks. It is based on a normalized random initialization of the network weights that aims at preserving the variance of the network input and output in the same range. The method is applied to standard LSTMs for univariate time series regression and to LSTMs robust to missing values for multivariate disease progression modeling. The results show that in all cases, the proposed initialization method outperforms the state-of-the-art initialization techniques in terms of training convergence and generalization performance of the obtained solution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes