CLAug 20, 2016

Using the Output Embedding to Improve Language Models

arXiv:1608.05859v3819 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency and performance issues in language modeling and machine translation, offering a practical improvement for researchers and practitioners.

The authors tackled the problem of improving neural network language models by tying the input and output embeddings, which led to a significant reduction in perplexity and allowed neural translation models to be reduced to less than half their original size without performance loss.

We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.

Code Implementations10 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes