CLJul 21, 2021

Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages

Srijan Bansal, Vishal Garimella, Ayush Suhane, Animesh Mukherjee

arXiv:2107.10181v22.011 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses bias in NLP applications for Indian language speakers, but it is incremental as it advances existing monolingual methods to a multilingual setting.

The paper tackled the problem of debiasing multilingual word embeddings, specifically for Hindi, Bengali, Telugu, and English, and achieved state-of-the-art performance in bias mitigation for these languages.

In this paper, we advance the current state-of-the-art method for debiasing monolingual word embeddings so as to generalize well in a multilingual setting. We consider different methods to quantify bias and different debiasing approaches for monolingual as well as multilingual settings. We demonstrate the significance of our bias-mitigation approach on downstream NLP applications. Our proposed methods establish the state-of-the-art performance for debiasing multilingual embeddings for three Indian languages - Hindi, Bengali, and Telugu in addition to English. We believe that our work will open up new opportunities in building unbiased downstream NLP applications that are inherently dependent on the quality of the word embeddings used.

View on arXiv PDF Code

Similar