CLMar 4, 2024

Detection of Non-recorded Word Senses in English and Swedish

Jonathan Lautenschlager, Emma Sköldberg, Simon Hengchen, Dominik Schlechtweg

arXiv:2403.02285v24.24 citationsh-index: 9Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of identifying undocumented word meanings for linguists and lexicographers, but it is incremental as it builds on existing methods with new data.

The study tackled the task of Unknown Sense Detection in English and Swedish by comparing dictionary sense entries with word usages from corpora using a pre-trained Word-in-Context embedder in a few-shot scenario, resulting in a considerable increase in detected non-recorded senses compared to a random sample.

This study addresses the task of Unknown Sense Detection in English and Swedish. The primary objective of this task is to determine whether the meaning of a particular word usage is documented in a dictionary or not. For this purpose, sense entries are compared with word usages from modern and historical corpora using a pre-trained Word-in-Context embedder that allows us to model this task in a few-shot scenario. Additionally, we use human annotations on the target corpora to adapt hyperparameters and evaluate our models using 5-fold cross-validation. Compared to a random sample from a corpus, our model is able to considerably increase the detected number of word usages with non-recorded senses.

View on arXiv PDF Code

Similar