CLMar 23, 2023

SwissBERT: The Multilingual Language Model for Switzerland

arXiv:2303.13310v3132 citationsh-index: 49Has Code
Originality Synthesis-oriented
AI Analysis

This provides a domain-specific tool for processing Switzerland-related multilingual text, though it is incremental as it adapts existing methods to new data.

The researchers tackled the problem of processing Switzerland-related multilingual text by creating SwissBERT, a masked language model adapted for German, French, Italian, and Romansh news articles, which tends to outperform previous models on Switzerland-specific natural language understanding tasks, especially for contemporary news and Romansh Grischun.

We present SwissBERT, a masked language model created specifically for processing Switzerland-related text. SwissBERT is a pre-trained model that we adapted to news articles written in the national languages of Switzerland -- German, French, Italian, and Romansh. We evaluate SwissBERT on natural language understanding tasks related to Switzerland and find that it tends to outperform previous models on these tasks, especially when processing contemporary news and/or Romansh Grischun. Since SwissBERT uses language adapters, it may be extended to Swiss German dialects in future work. The model and our open-source code are publicly released at https://github.com/ZurichNLP/swissbert.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes