CLNEMLFeb 23, 2018

Reusing Weights in Subword-aware Neural Language Models

arXiv:1802.08375v21088 citations
AI Analysis

This work addresses the challenge of parameter efficiency in language modeling for NLP practitioners, though it is incremental as it builds on existing subword-aware models.

The paper tackles the problem of reducing model size in subword-aware neural language models by reusing weights, finding that this approach improves performance for syllable- and morpheme-aware models but not for character-aware ones, with the best model achieving 20%-87% fewer parameters and outperforming word-level models across multiple languages.

We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable- and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multi-layer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%-87% fewer parameters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes