DA-LSTM: A Long Short-Term Memory with Depth Adaptive to Non-uniform Information Flow in Sequential Data
This addresses inefficiencies in LSTM models for sequential data processing, offering computational savings for applications with variable information flow.
The paper tackles the problem of non-uniform information distribution in sequential data by developing a Depth-Adaptive LSTM (DA-LSTM) that dynamically adjusts its structure, resulting in 41.78% and 46.01% reductions in convergence time compared to Stacked LSTM and Deep Transition LSTM.
Much sequential data exhibits highly non-uniform information distribution. This cannot be correctly modeled by traditional Long Short-Term Memory (LSTM). To address that, recent works have extended LSTM by adding more activations between adjacent inputs. However, the approaches often use a fixed depth, which is at the step of the most information content. This one-size-fits-all worst-case approach is not satisfactory, because when little information is distributed to some steps, shallow structures can achieve faster convergence and consume less computation resource. In this paper, we develop a Depth-Adaptive Long Short-Term Memory (DA-LSTM) architecture, which can dynamically adjust the structure depending on information distribution without prior knowledge. Experimental results on real-world datasets show that DA-LSTM costs much less computation resource and substantially reduce convergence time by $41.78\%$ and $46.01 \%$, compared with Stacked LSTM and Deep Transition LSTM, respectively.