CL AI LGAug 11, 2022

Word-Embeddings Distinguish Denominal and Root-Derived Verbs in Semitic

Ido Benbaji, Omri Doron, Adèle Hénot-Mortier

arXiv:2208.05721v11 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This work addresses a theoretical linguistics problem by providing empirical evidence from computational models, though it is incremental as it applies an existing method to a specific domain.

The study tested the Distributed Morphology hypothesis that denominal verbs are semantically closer to their source nouns than root-derived verbs using Hebrew word embeddings, and found that four embedding models (fastText, GloVe, Word2Vec, AlephBERT) verified this prediction.

Proponents of the Distributed Morphology framework have posited the existence of two levels of morphological word formation: a lower one, leading to loose input-output semantic relationships; and an upper one, leading to tight input-output semantic relationships. In this work, we propose to test the validity of this assumption in the context of Hebrew word embeddings. If the two-level hypothesis is borne out, we expect state-of-the-art Hebrew word embeddings to encode (1) a noun, (2) a denominal derived from it (via an upper-level operation), and (3) a verb related to the noun (via a lower-level operation on the noun's root), in such a way that the denominal (2) should be closer in the embedding space to the noun (1) than the related verb (3) is to the same noun (1). We report that this hypothesis is verified by four embedding models of Hebrew: fastText, GloVe, Word2Vec and AlephBERT. This suggests that word embedding models are able to capture complex and fine-grained semantic properties that are morphologically motivated.

View on arXiv PDF

Similar