CLAug 6, 2023

3D-EX : A Unified Dataset of Definitions and Dictionary Examples

arXiv:2308.03043v2134 citationsh-index: 25Has Code
Originality Synthesis-oriented
AI Analysis

This provides a centralized resource for NLP researchers working on tasks like word embeddings and language models, but it is incremental as it combines existing datasets.

The authors tackled the problem of inconsistent lexical resources in NLP by introducing 3D-EX, a unified dataset of definitions and dictionary examples, and reported that it could be effectively leveraged in downstream NLP tasks.

Definitions are a fundamental building block in lexicography, linguistics and computational semantics. In NLP, they have been used for retrofitting word embeddings or augmenting contextual representations in language models. However, lexical resources containing definitions exhibit a wide range of properties, which has implications in the behaviour of models trained and evaluated on them. In this paper, we introduce 3D- EX , a dataset that aims to fill this gap by combining well-known English resources into one centralized knowledge repository in the form of <term, definition, example> triples. 3D- EX is a unified evaluation framework with carefully pre-computed train/validation/test splits to prevent memorization. We report experimental results that suggest that this dataset could be effectively leveraged in downstream NLP tasks. Code and data are available at https://github.com/F-Almeman/3D-EX .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes