NorBench -- A Benchmark for Norwegian Language Models
This addresses the problem of evaluating Norwegian LMs for researchers and practitioners, but it is incremental as it adapts existing benchmarking approaches to a specific language.
The authors tackled the lack of standardized evaluation for Norwegian language models by introducing NorBench, a benchmark suite with tasks and probes, and new models, resulting in performance comparisons across tests.
We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics. We also introduce a range of new Norwegian language models (both encoder and encoder-decoder based). Finally, we compare and analyze their performance, along with other existing LMs, across the different benchmark tests of NorBench.