SuperSim: a test set for word similarity and relatedness in Swedish
This work addresses the challenge of evaluating language models for Swedish, offering a new benchmark for researchers in natural language processing, though it is incremental as it adapts existing methods to a new language.
The authors tackled the problem of evaluating language models by releasing SuperSim, a large-scale similarity and relatedness test set for Swedish built with expert human judgments, comprising 1,360 word-pairs judged by five annotators, and provided baselines using models like Word2Vec, fastText, and GloVe.
Language models are notoriously difficult to evaluate. We release SuperSim, a large-scale similarity and relatedness test set for Swedish built with expert human judgments. The test set is composed of 1,360 word-pairs independently judged for both relatedness and similarity by five annotators. We evaluate three different models (Word2Vec, fastText, and GloVe) trained on two separate Swedish datasets, namely the Swedish Gigaword corpus and a Swedish Wikipedia dump, to provide a baseline for future comparison. We release the fully annotated test set, code, baseline models, and data.