CLDec 20, 2022

Geographic and Geopolitical Biases of Language Models

CMU

arXiv:2212.10408v117.9142 citationsh-index: 33

Originality Incremental advance

AI Analysis

This addresses fairness issues for users from underrepresented regions in AI applications, though it is incremental in bias analysis.

The study quantified geographic and geopolitical biases in pretrained language models (PLMs), finding that their representations map well to country associations but unequally across languages, and they over-amplify geopolitical favoritism at inference.

Pretrained language models (PLMs) often fail to fairly represent target users from certain world regions because of the under-representation of those regions in training datasets. With recent PLMs trained on enormous data sources, quantifying their potential biases is difficult, due to their black-box nature and the sheer scale of the data sources. In this work, we devise an approach to study the geographic bias (and knowledge) present in PLMs, proposing a Geographic-Representation Probing Framework adopting a self-conditioning method coupled with entity-country mappings. Our findings suggest PLMs' representations map surprisingly well to the physical world in terms of country-to-country associations, but this knowledge is unequally shared across languages. Last, we explain how large PLMs despite exhibiting notions of geographical proximity, over-amplify geopolitical favouritism at inference time.

View on arXiv PDF

Similar