A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change
This provides standardized benchmarking for researchers working on lexical semantic change detection, though it's largely incremental in comparing existing methods.
The authors systematically compared contextualized word embeddings for lexical semantic change detection under standardized conditions across eight benchmarks and multiple languages, finding that APD outperformed other approaches for graded change detection while XL-LEXEME performed best across word-in-context, word sense induction, and graded change detection tasks.
Contextualized embeddings are the preferred tool for modeling Lexical Semantic Change (LSC). Current evaluations typically focus on a specific task known as Graded Change Detection (GCD). However, performance comparison across work are often misleading due to their reliance on diverse settings. In this paper, we evaluate state-of-the-art models and approaches for GCD under equal conditions. We further break the LSC problem into Word-in-Context (WiC) and Word Sense Induction (WSI) tasks, and compare models across these different levels. Our evaluation is performed across different languages on eight available benchmarks for LSC, and shows that (i) APD outperforms other approaches for GCD; (ii) XL-LEXEME outperforms other contextualized models for WiC, WSI, and GCD, while being comparable to GPT-4; (iii) there is a clear need for improving the modeling of word meanings, as well as focus on how, when, and why these meanings change, rather than solely focusing on the extent of semantic change.