CLApr 17, 2021

DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

arXiv:2104.08540v3665 citations
AI Analysis

This provides a dataset for researchers in computational linguistics and NLP to study semantic change, but it is incremental as it builds on existing annotation methods.

The authors tackled the challenge of capturing word meaning over time by creating DWUG, the largest resource of graded contextualized, diachronic word meaning annotations in four languages, based on 100,000 human semantic proximity judgments.

Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible - diachronic and synchronic - uses for this dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes