CLJun 2, 2024

Evaluating Distributed Representations for Multi-Level Lexical Semantics: A Research Proposal

arXiv:2406.00751v2
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better evaluation of word representations in natural language processing, but it appears incremental as it builds on existing methods without proposing new paradigms.

The research tackles the problem of evaluating how well neural network-based distributed representations capture multi-level lexical semantics by formalizing local, global, and mixed levels and assessing language models with multilingual datasets and linguistic theories, aiming to bridge computational models and lexical semantics.

Modern neural networks (NNs), trained on extensive raw sentence data, construct distributed representations by compressing individual words into dense, continuous, high-dimensional vectors. These representations are expected to capture multi-level lexical meaning. In this thesis, our objective is to examine the efficacy of distributed representations from NNs in encoding lexical meaning. Initially, we identify and formalize three levels of lexical semantics: \textit{local}, \textit{global}, and \textit{mixed} levels. Then, for each level, we evaluate language models by collecting or constructing multilingual datasets, leveraging various language models, and employing linguistic analysis theories. This thesis builds a bridge between computational models and lexical semantics, aiming to complement each other.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes