Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings
This addresses bias in word embeddings for multilingual NLP applications, but is incremental as it builds on prior work on cultural biases.
The study investigated how grammatical gender associations in crosslinguistic word embeddings outweigh topical gender bias, finding that both biases can be reduced through corpus lemmatization, with implications for machine translation.
Recent research has demonstrated that vector space models of semantics can reflect undesirable biases in human culture. Our investigation of crosslinguistic word embeddings reveals that topical gender bias interacts with, and is surpassed in magnitude by, the effect of grammatical gender associations, and both may be attenuated by corpus lemmatization. This finding has implications for downstream applications such as machine translation.