ML LGJun 6, 2020

Do RNN and LSTM have Long Memory?

Jingyu Zhao, Feiqing Huang, Jia Lv, Yanjie Duan, Zhen Qin, Guodong Li, Guangjian Tian

arXiv:2006.03860v222.2192 citations

Originality Incremental advance

AI Analysis

This addresses a fundamental limitation in sequence modeling for AI/ML researchers, though it is incremental as it builds on existing RNN/LSTM frameworks.

The paper proves that RNN and LSTM lack long memory from a statistical perspective, and by modifying them to have polynomial weight decay, it shows improved performance in modeling long-term dependencies on various datasets.

The LSTM network was proposed to overcome the difficulty in learning long-term dependence, and has made significant advancements in applications. With its success and drawbacks in mind, this paper raises the question - do RNN and LSTM have long memory? We answer it partially by proving that RNN and LSTM do not have long memory from a statistical perspective. A new definition for long memory networks is further introduced, and it requires the model weights to decay at a polynomial rate. To verify our theory, we convert RNN and LSTM into long memory networks by making a minimal modification, and their superiority is illustrated in modeling long-term dependence of various datasets.

View on arXiv PDF

Similar