CLMar 26, 2024

Enriching Word Usage Graphs with Cluster Definitions

Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev, Dominik Schlechtweg

arXiv:2403.18024v124.083 citationsh-index: 4Has CodeLREC

Originality Incremental advance

AI Analysis

This work provides enriched datasets for explainable semantic change modeling, offering a straightforward and extensible method for multiple languages, though it is incremental in nature.

The authors tackled the problem of enriching word usage graphs with sense definitions by generating cluster labels using fine-tuned language models, resulting in definitions that better matched existing clusters than WordNet baselines in human evaluation.

We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions. They are generated from scratch by fine-tuned encoder-decoder language models. The conducted human evaluation has shown that these definitions match the existing clusters in WUGs better than the definitions chosen from WordNet by two baseline systems. At the same time, the method is straightforward to use and easy to extend to new languages. The resulting enriched datasets can be extremely helpful for moving on to explainable semantic change modeling.

View on arXiv PDF Code

Similar