Measuring Model Biases in the Absence of Ground Truth
This addresses the challenge of bias assessment in machine learning for practitioners who lack annotated data, though it is incremental as it builds on existing association metrics.
The paper tackles the problem of measuring model biases without requiring ground truth labels or fully annotated datasets, by introducing a method that ranks biases learned by classification models using association metrics like normalized pointwise mutual information (nPMI). It demonstrates this approach with gender identity labels in image classification and releases an open-source visualization tool.
The measurement of bias in machine learning often focuses on model performance across identity subgroups (such as man and woman) with respect to groundtruth labels. However, these methods do not directly measure the associations that a model may have learned, for example between labels and identity subgroups. Further, measuring a model's bias requires a fully annotated evaluation dataset which may not be easily available in practice. We present an elegant mathematical solution that tackles both issues simultaneously, using image classification as a working example. By treating a classification model's predictions for a given image as a set of labels analogous to a bag of words, we rank the biases that a model has learned with respect to different identity labels. We use (man, woman) as a concrete example of an identity label set (although this set need not be binary), and present rankings for the labels that are most biased towards one identity or the other. We demonstrate how the statistical properties of different association metrics can lead to different rankings of the most "gender biased" labels, and conclude that normalized pointwise mutual information (nPMI) is most useful in practice. Finally, we announce an open-sourced nPMI visualization tool using TensorBoard.