CLJun 4, 2018

Self-Normalization Properties of Language Modeling

arXiv:1806.00913v11090 citations
Originality Incremental advance
AI Analysis

This addresses computational efficiency in language modeling for NLP applications, but is incremental as it builds on existing methods like NCE and softmax.

The study investigated self-normalization in language models to avoid computing partition functions, finding a surprising negative correlation between self-normalization and perplexity across models.

Self-normalizing discriminative models approximate the normalized probability of a class without having to compute the partition function. In the context of language modeling, this property is particularly appealing as it may significantly reduce run-times due to large word vocabularies. In this study, we provide a comprehensive investigation of language modeling self-normalization. First, we theoretically analyze the inherent self-normalization properties of Noise Contrastive Estimation (NCE) language models. Then, we compare them empirically to softmax-based approaches, which are self-normalized using explicit regularization, and suggest a hybrid model with compelling properties. Finally, we uncover a surprising negative correlation between self-normalization and perplexity across the board, as well as some regularity in the observed errors, which may potentially be used for improving self-normalization algorithms in the future.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes