CLLGJun 10, 2024

Optimal synthesis embeddings

arXiv:2406.10259v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating effective embeddings for word sets and sentences, which is incremental as it builds on existing embedding techniques.

The paper tackles the problem of composing word embeddings by proposing a method that minimizes the distance between a new vector and its constituent word vectors, applicable to both static and contextualized representations for sentences and unordered word sets. The method is evaluated on data augmentation and sentence classification tasks, showing strong performance in probing tasks for linguistic features.

In this paper we introduce a word embedding composition method based on the intuitive idea that a fair embedding representation for a given set of words should satisfy that the new vector will be at the same distance of the vector representation of each of its constituents, and this distance should be minimized. The embedding composition method can work with static and contextualized word representations, it can be applied to create representations of sentences and learn also representations of sets of words that are not necessarily organized as a sequence. We theoretically characterize the conditions for the existence of this type of representation and derive the solution. We evaluate the method in data augmentation and sentence classification tasks, investigating several design choices of embeddings and composition methods. We show that our approach excels in solving probing tasks designed to capture simple linguistic features of sentences.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes