CLOct 15, 2021

Socially Aware Bias Measurements for Hindi Language Representations

Vijit Malik, Sunipa Dev, Akihiro Nishi, Nanyun Peng, Kai-Wei Chang

arXiv:2110.07871v230.5633 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses bias measurement for Hindi speakers, highlighting cultural and linguistic awareness, but it is incremental as it extends existing bias studies to a new language.

The paper tackled the problem of societal biases in Hindi language representations, focusing on caste and religion, and demonstrated that biases are unique to specific languages and encoded differently across them.

Language representations are efficient tools used across NLP applications, but they are strife with encoded societal biases. These biases are studied extensively, but with a primary focus on English language representations and biases common in the context of Western society. In this work, we investigate biases present in Hindi language representations with focuses on caste and religion-associated biases. We demonstrate how biases are unique to specific language representations based on the history and culture of the region they are widely spoken in, and how the same societal bias (such as binary gender-associated biases) is encoded by different words and text spans across languages. The discoveries of our work highlight the necessity of culture awareness and linguistic artifacts when modeling language representations, in order to better understand the encoded biases.

View on arXiv PDF Code

Similar