LGMLOct 13, 2019

Constrained Non-Affine Alignment of Embeddings

arXiv:1910.05862v43 citations
Originality Incremental advance
AI Analysis

This addresses a developing area in embedding analysis for domains like large language models and image analysis, offering a method to adjust embeddings post-creation, though it appears incremental as it builds on existing Domain Adversarial Networks with added constraints.

The paper tackles the problem of removing undesired features from embeddings while preserving essential data structure, proposing a constrained non-affine alignment method that significantly outperforms state-of-the-art unsupervised algorithms on multiple datasets.

Embeddings are one of the fundamental building blocks for data analysis tasks. Embeddings are already essential tools for large language models and image analysis, and their use is being extended to many other research domains. The generation of these distributed representations is often a data- and computation-expensive process; yet the holistic analysis and adjustment of them after they have been created is still a developing area. In this paper, we first propose a very general quantitatively measure for the presence of features in the embedding data based on if it can be learned. We then devise a method to remove or alleviate undesired features in the embedding while retaining the essential structure of the data. We use a Domain Adversarial Network (DAN) to generate a non-affine transformation, but we add constraints to ensure the essential structure of the embedding is preserved. Our empirical results demonstrate that the proposed algorithm significantly outperforms the state-of-art unsupervised algorithm on several data sets, including novel applications from the industry.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes