CLAIJan 22, 2023

An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models

arXiv:2301.09211v1234 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This work addresses the issue of societal biases in AI models for researchers and practitioners, but it is incremental as it builds on existing metrics and analyses.

The study tackled the problem of measuring implicit representational harms in pre-trained language models by proposing a new metric and analyzing 24 models, finding that prioritizing depth over width mitigates harms in some cases.

Large-scale Pre-Trained Language Models (PTLMs) capture knowledge from massive human-written data which contains latent societal biases and toxic contents. In this paper, we leverage the primary task of PTLMs, i.e., language modeling, and propose a new metric to quantify manifested implicit representational harms in PTLMs towards 13 marginalized demographics. Using this metric, we conducted an empirical analysis of 24 widely used PTLMs. Our analysis provides insights into the correlation between the proposed metric in this work and other related metrics for representational harm. We observe that our metric correlates with most of the gender-specific metrics in the literature. Through extensive experiments, we explore the connections between PTLMs architectures and representational harms across two dimensions: depth and width of the networks. We found that prioritizing depth over width, mitigates representational harms in some PTLMs. Our code and data can be found at https://github.com/microsoft/SafeNLP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes