CLAug 22, 2021

UzBERT: pretraining a BERT model for Uzbek

arXiv:2108.09814v11.220 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a specialized tool for Uzbek NLP tasks, but it is incremental as it applies an existing method to a new language.

The paper tackled the lack of a publicly available monolingual pretrained language model for Uzbek by introducing UzBERT, a BERT-based model that greatly outperforms multilingual BERT on masked language model accuracy.

Pretrained language models based on the Transformer architecture have achieved state-of-the-art results in various natural language processing tasks such as part-of-speech tagging, named entity recognition, and question answering. However, no such monolingual model for the Uzbek language is publicly available. In this paper, we introduce UzBERT, a pretrained Uzbek language model based on the BERT architecture. Our model greatly outperforms multilingual BERT on masked language model accuracy. We make the model publicly available under the MIT open-source license.

View on arXiv PDF

Similar