LG CLDec 3, 2014

Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation

Noam Shazeer, Joris Pelemans, Ciprian Chelba

arXiv:1412.1454v23 citations

Originality Highly original

AI Analysis

This addresses the problem of scalable and flexible language modeling for NLP applications, offering computational advantages over existing methods.

The authors tackled language model estimation by introducing Sparse Non-negative Matrix (SNM) techniques, achieving results that match state-of-the-art recurrent neural network models on the One Billion Word Benchmark and setting a new best result when combined with skip-gram features.

We present a novel family of language model (LM) estimation techniques named Sparse Non-negative Matrix (SNM) estimation. A first set of experiments empirically evaluating it on the One Billion Word Benchmark shows that SNM $n$-gram LMs perform almost as well as the well-established Kneser-Ney (KN) models. When using skip-gram features the models are able to match the state-of-the-art recurrent neural network (RNN) LMs; combining the two modeling techniques yields the best known result on the benchmark. The computational advantages of SNM over both maximum entropy and RNN LM estimation are probably its main strength, promising an approach that has the same flexibility in combining arbitrary features effectively and yet should scale to very large amounts of data as gracefully as $n$-gram LMs do.

View on arXiv PDF

Similar