CLFeb 22, 2021

Evaluating Contextualized Language Models for Hungarian

arXiv:2102.10848v12 citations
Originality Synthesis-oriented
AI Analysis

This work provides practical guidance for NLP practitioners working with Hungarian, though it is incremental as it applies existing model evaluation methods to a specific language.

The researchers compared contextualized language models for Hungarian, finding that the Hungarian-specific huBERT model outperformed four multilingual models across morphological probing, POS tagging, and NER tasks, often by large margins, particularly in middle layers.

We present an extended comparison of contextualized language models for Hungarian. We compare huBERT, a Hungarian model against 4 multilingual models including the multilingual BERT model. We evaluate these models through three tasks, morphological probing, POS tagging and NER. We find that huBERT works better than the other models, often by a large margin, particularly near the global optimum (typically at the middle layers). We also find that huBERT tends to generate fewer subwords for one word and that using the last subword for token-level tasks is generally a better choice than using the first one.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes