CLJan 13, 2022

NorDiaChange: Diachronic Semantic Change Dataset for Norwegian

arXiv:2201.05123v2586 citations
AI Analysis

This provides a resource for researchers in computational linguistics and historical linguistics studying semantic change in Norwegian, but it is incremental as it applies existing annotation methods to a new language.

The authors tackled the lack of diachronic semantic change data for Norwegian by creating NorDiaChange, the first such dataset, which includes about 80 nouns manually annotated for graded semantic change over time, covering periods related to historical events and technological developments.

We describe NorDiaChange: the first diachronic semantic change dataset for Norwegian. NorDiaChange comprises two novel subsets, covering about 80 Norwegian nouns manually annotated with graded semantic change over time. Both datasets follow the same annotation procedure and can be used interchangeably as train and test splits for each other. NorDiaChange covers the time periods related to pre- and post-war events, oil and gas discovery in Norway, and technological developments. The annotation was done using the DURel framework and two large historical Norwegian corpora. NorDiaChange is published in full under a permissive licence, complete with raw annotation data and inferred diachronic word usage graphs (DWUGs).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes