CLAIAug 23, 2023

Semantic Change Detection for the Romanian Language

arXiv:2308.12131v11 citationsh-index: 23
Originality Synthesis-oriented
AI Analysis

This work addresses semantic change detection for low-resource languages like Romanian, but it is incremental as it applies existing methods to new data.

The paper tackled semantic change detection in low-resource languages by evaluating Word2Vec and ELMo models on English and Romanian datasets, finding that model choice and distance metrics are key factors influencing performance.

Automatic semantic change methods try to identify the changes that appear over time in the meaning of words by analyzing their usage in diachronic corpora. In this paper, we analyze different strategies to create static and contextual word embedding models, i.e., Word2Vec and ELMo, on real-world English and Romanian datasets. To test our pipeline and determine the performance of our models, we first evaluate both word embedding models on an English dataset (SEMEVAL-CCOHA). Afterward, we focus our experiments on a Romanian dataset, and we underline different aspects of semantic changes in this low-resource language, such as meaning acquisition and loss. The experimental results show that, depending on the corpus, the most important factors to consider are the choice of model and the distance to calculate a score for detecting semantic change.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes