Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate
This addresses computational inefficiency and poor generalization in RNNs for sequential data processing, though it appears incremental as an enhancement to existing RNN architectures.
The paper tackles the issues of gradient vanishing/exploding and over-parameterization in RNNs by proposing a Delayed Memory Unit (DMU) that uses delay gates to distribute input to optimal future times, achieving superior temporal modeling with fewer parameters across tasks like speech recognition and ECG segmentation.
Recurrent Neural Networks (RNNs) are widely recognized for their proficiency in modeling temporal dependencies, making them highly prevalent in sequential data processing applications. Nevertheless, vanilla RNNs are confronted with the well-known issue of gradient vanishing and exploding, posing a significant challenge for learning and establishing long-range dependencies. Additionally, gated RNNs tend to be over-parameterized, resulting in poor computational efficiency and network generalization. To address these challenges, this paper proposes a novel Delayed Memory Unit (DMU). The DMU incorporates a delay line structure along with delay gates into vanilla RNN, thereby enhancing temporal interaction and facilitating temporal credit assignment. Specifically, the DMU is designed to directly distribute the input information to the optimal time instant in the future, rather than aggregating and redistributing it over time through intricate network dynamics. Our proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks, utilizing considerably fewer parameters than other state-of-the-art gated RNN models in applications such as speech recognition, radar gesture recognition, ECG waveform segmentation, and permuted sequential image classification.