CLJan 26, 2017

emLam -- a Hungarian Language Modeling baseline

arXiv:1701.07880v10.73 citations

Originality Synthesis-oriented

AI Analysis

This provides a foundational resource for researchers and practitioners working on Hungarian NLP, but it is incremental as it adapts existing methods to a new language.

The paper tackled the lack of documented baselines for Hungarian language modeling by evaluating various approaches on three publicly available Hungarian corpora, reporting perplexity values comparable to models on similar-sized English corpora and introducing a new, freely downloadable Hungarian benchmark corpus.

This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three publicly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungar- ian benchmark corpus is introduced.

View on arXiv PDF

Similar