CLLGMay 29, 2021

Predictive Representation Learning for Language Modeling

arXiv:2105.14214v11 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing language model efficiency and effectiveness for natural language processing tasks, though it is incremental as it builds on existing reinforcement learning techniques.

The paper tackles the problem of improving language modeling by explicitly supervising LSTM representations to predict secondary information, which is typically learned implicitly, resulting in significant improvements in two strong language modeling methods, faster convergence, and better performance with limited data.

To effectively perform the task of next-word prediction, long short-term memory networks (LSTMs) must keep track of many types of information. Some information is directly related to the next word's identity, but some is more secondary (e.g. discourse-level features or features of downstream words). Correlates of secondary information appear in LSTM representations even though they are not part of an \emph{explicitly} supervised prediction task. In contrast, in reinforcement learning (RL), techniques that explicitly supervise representations to predict secondary information have been shown to be beneficial. Inspired by that success, we propose Predictive Representation Learning (PRL), which explicitly constrains LSTMs to encode specific predictions, like those that might need to be learned implicitly. We show that PRL 1) significantly improves two strong language modeling methods, 2) converges more quickly, and 3) performs better when data is limited. Our work shows that explicitly encoding a simple predictive task facilitates the search for a more effective language model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes