CLJul 21, 2021

Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages

arXiv:2107.10181v211 citations
Originality Incremental advance
AI Analysis

This work addresses bias in NLP applications for Indian language speakers, but it is incremental as it advances existing monolingual methods to a multilingual setting.

The paper tackled the problem of debiasing multilingual word embeddings, specifically for Hindi, Bengali, Telugu, and English, and achieved state-of-the-art performance in bias mitigation for these languages.

In this paper, we advance the current state-of-the-art method for debiasing monolingual word embeddings so as to generalize well in a multilingual setting. We consider different methods to quantify bias and different debiasing approaches for monolingual as well as multilingual settings. We demonstrate the significance of our bias-mitigation approach on downstream NLP applications. Our proposed methods establish the state-of-the-art performance for debiasing multilingual embeddings for three Indian languages - Hindi, Bengali, and Telugu in addition to English. We believe that our work will open up new opportunities in building unbiased downstream NLP applications that are inherently dependent on the quality of the word embeddings used.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes