CL LG MLOct 14, 2024

On Debiasing Text Embeddings Through Context Injection

arXiv:2410.12874v2h-index: 4

Originality Incremental advance

AI Analysis

This addresses bias in text embeddings for NLP applications, but it is incremental as it builds on existing debiasing techniques by leveraging context understanding.

The paper reviewed 19 embedding models to quantify biases and their response to context injection for debiasing, finding that higher-performing models capture more biases but incorporate context better, and designed a dynamic top-k retrieval algorithm that retrieves all relevant gendered and neutral chunks.

Current advances in Natural Language Processing (NLP) have made it increasingly feasible to build applications leveraging textual data. Generally, the core of these applications rely on having a good semantic representation of text into vectors, via embedding models. However, it has been shown that these embeddings capture and perpetuate biases already present in text. While a few techniques have been proposed to debias embeddings, they do not take advantage of the recent advances in context understanding of modern embedding models. In this paper, we fill this gap by conducting a review of 19 embedding models by quantifying their biases and how well they respond to context injection as a mean of debiasing. We show that higher performing models are more prone to capturing biases, but are also better at incorporating context. Surprisingly, we find that while models can easily embed affirmative semantics, they fail at embedding neutral semantics. Finally, in a retrieval task, we show that biases in embeddings can lead to non-desirable outcomes. We use our new-found insights to design a simple algorithm for top $k$ retrieval, where $k$ is dynamically selected. We show that our algorithm is able to retrieve all relevant gendered and neutral chunks.

View on arXiv PDF

Similar