CLDec 19, 2019

BERTje: A Dutch BERT Model

arXiv:1912.09582v1322 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a specialized tool for Dutch NLP applications, though it is incremental as it adapts an existing method to a new language.

The authors tackled the lack of a high-performance monolingual Dutch BERT model by developing BERTje, which consistently outperforms the multilingual BERT model on downstream NLP tasks such as part-of-speech tagging and sentiment analysis.

The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks. Using the same architecture and parameters, we developed and evaluated a monolingual Dutch BERT model called BERTje. Compared to the multilingual BERT model, which includes Dutch but is only based on Wikipedia text, BERTje is based on a large and diverse dataset of 2.4 billion tokens. BERTje consistently outperforms the equally-sized multilingual BERT model on downstream NLP tasks (part-of-speech tagging, named-entity recognition, semantic role labeling, and sentiment analysis). Our pre-trained Dutch BERT model is made available at https://github.com/wietsedv/bertje.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes