On Measuring and Mitigating Biased Inferences of Word Embeddings
This addresses bias in AI systems for fairness and reliability, but it is incremental as it builds on existing bias mitigation techniques.
The paper tackled the problem of biased inferences from word embeddings by measuring stereotypes via natural language inference and demonstrated a reduction in invalid inferences through bias mitigation strategies on static embeddings like GloVe, with extensions to contextualized embeddings like ELMo and BERT for gender bias.
Word embeddings carry stereotypical connotations from the text they are trained on, which can lead to invalid inferences in downstream models that rely on them. We use this observation to design a mechanism for measuring stereotypes using the task of natural language inference. We demonstrate a reduction in invalid inferences via bias mitigation strategies on static word embeddings (GloVe). Further, we show that for gender bias, these techniques extend to contextualized embeddings when applied selectively only to the static components of contextualized embeddings (ELMo, BERT).