CLMay 19, 2023

Extending Memory for Language Modelling

arXiv:2305.11462v10.5

Originality Incremental advance

AI Analysis

This addresses a bottleneck in natural language understanding for researchers and practitioners, though it appears incremental as an extension of existing memory networks.

The authors tackled the problem of language models struggling with long sequences due to memory constraints by introducing the Long Term Memory network (LTM), which achieved competitive results on datasets like Penn Treebank, Google Billion Word, and WikiText-2.

Breakthroughs in deep learning and memory networks have made major advances in natural language understanding. Language is sequential and information carried through the sequence can be captured through memory networks. Learning the sequence is one of the key aspects in learning the language. However, memory networks are not capable of holding infinitely long sequences in their memories and are limited by various constraints such as the vanishing or exploding gradient problem. Therefore, natural language understanding models are affected when presented with long sequential text. We introduce Long Term Memory network (LTM) to learn from infinitely long sequences. LTM gives priority to the current inputs to allow it to have a high impact. Language modeling is an important factor in natural language understanding. LTM was tested in language modeling, which requires long term memory. LTM is tested on Penn Tree bank dataset, Google Billion Word dataset and WikiText-2 dataset. We compare LTM with other language models which require long term memory.

View on arXiv PDF

Similar