CLApr 13, 2021

On the Interpretability and Significance of Bias Metrics in Texts: a PMI-based Approach

Francisco Valentini, Germán Rosati, Damián Blasi, Diego Fernandez Slezak, Edgar Altszyler

arXiv:2104.06474v220.0223 citationsHas Code

Originality Incremental advance

AI Analysis

This provides a more interpretable tool for researchers and practitioners measuring biases in texts, though it appears incremental as it offers an alternative method rather than a fundamental breakthrough.

The paper tackles the lack of transparency in word embedding-based bias metrics by proposing a PMI-based alternative that quantifies biases through interpretable conditional probabilities and odds ratios, showing it produces similar results to embedding methods when capturing real-world gender gaps in large corpora.

In recent years, word embeddings have been widely used to measure biases in texts. Even if they have proven to be effective in detecting a wide variety of biases, metrics based on word embeddings lack transparency and interpretability. We analyze an alternative PMI-based metric to quantify biases in texts. It can be expressed as a function of conditional probabilities, which provides a simple interpretation in terms of word co-occurrences. We also prove that it can be approximated by an odds ratio, which allows estimating confidence intervals and statistical significance of textual biases. This approach produces similar results to metrics based on word embeddings when capturing gender gaps of the real world embedded in large corpora.

View on arXiv PDF Code

Similar