Generalizing and Hybridizing Count-based and Neural Language Models
This work addresses the challenge of integrating two major language modeling paradigms for improved efficiency and effectiveness in natural language processing tasks, though it appears incremental in nature.
The authors tackled the problem of unifying count-based and neural language models by proposing a single framework that dynamically mixes probability distributions, resulting in hybrid models that combine scalability and performance advantages.
Language models (LMs) are statistical models that calculate probabilities over sequences of words or other discrete symbols. Currently two major paradigms for language modeling exist: count-based n-gram models, which have advantages of scalability and test-time speed, and neural LMs, which often achieve superior modeling performance. We demonstrate how both varieties of models can be unified in a single modeling framework that defines a set of probability distributions over the vocabulary of words, and then dynamically calculates mixture weights over these distributions. This formulation allows us to create novel hybrid models that combine the desirable features of count-based and neural LMs, and experiments demonstrate the advantages of these approaches.