Are All Languages Equally Hard to Language-Model?
This addresses the challenge of evaluating language models across diverse languages for researchers and practitioners, though it is incremental as it builds on existing methods.
The authors tackled the problem of fair cross-linguistic comparison of language models by developing an evaluation framework using translated text, and found that languages with complex inflectional morphology are harder to predict, with performance differences demonstrated across 21 languages using n-gram and LSTM models.
For general modeling methods applied to diverse languages, a natural question is: how well should we expect our models to work on languages with differing typological profiles? In this work, we develop an evaluation framework for fair cross-linguistic comparison of language models, using translated text so that all models are asked to predict approximately the same information. We then conduct a study on 21 languages, demonstrating that in some languages, the textual expression of the information is harder to predict with both $n$-gram and LSTM language models. We show complex inflectional morphology to be a cause of performance differences among languages.