CLMar 21, 2022

Better Language Model with Hypernym Class Prediction

Apple
arXiv:2203.10692v1639 citationsh-index: 35
Originality Incremental advance
AI Analysis

This work addresses generalization for rare words in language modeling, though it is incremental as it revisits and adapts an old class-based approach to neural models.

The paper tackles the problem of context sparsity in neural language models by using hypernym-based class prediction with curriculum learning, resulting in consistent perplexity improvements on WikiText-103 and Arxiv datasets without harming rare word performance.

Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs. We hypothesize that class-based prediction leads to an implicit context aggregation for similar words and thus can improve generalization for rare words. We map words that have a common WordNet hypernym to the same class and train large neural LMs by gradually annealing from predicting the class to token prediction during training. Empirically, this curriculum learning strategy consistently improves perplexity over various large, highly-performant state-of-the-art Transformer-based models on two datasets, WikiText-103 and Arxiv. Our analysis shows that the performance improvement is achieved without sacrificing performance on rare words. Finally, we document other attempts that failed to yield empirical gains, and discuss future directions for the adoption of class-based LMs on a larger scale.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes