Jonathan Lautenschlager, Emma Sköldberg, Simon Hengchen et al.
This study addresses the task of Unknown Sense Detection in English and Swedish. The primary objective of this task is to determine whether the meaning of a particular word usage is documented in a dictionary or not. For this purpose, sense entries are compared with word usages from modern and historical corpora using a pre-trained Word-in-Context embedder that allows us to model this task in a few-shot scenario. Additionally, we use human annotations on the target corpora to adapt hyperparameters and evaluate our models using 5-fold cross-validation. Compared to a random sample from a corpus, our model is able to considerably increase the detected number of word usages with non-recorded senses.