CL AI HC LGNov 16, 2021

Who Decides if AI is Fair? The Labels Problem in Algorithmic Auditing

arXiv:2111.08723v14 citations

Originality Incremental advance

AI Analysis

This addresses a critical issue for AI auditors and policymakers in high-stakes settings, highlighting practical trade-offs between cost and fairness, though it is incremental in nature.

The study tackled the problem of unreliable 'ground truth' labels in AI auditing datasets, showing that label quality significantly distorts audit results, with disparities between urban and rural populations disappearing after label cleaning.

Labelled "ground truth" datasets are routinely used to evaluate and audit AI algorithms applied in high-stakes settings. However, there do not exist widely accepted benchmarks for the quality of labels in these datasets. We provide empirical evidence that quality of labels can significantly distort the results of algorithmic audits in real-world settings. Using data annotators typically hired by AI firms in India, we show that fidelity of the ground truth data can lead to spurious differences in performance of ASRs between urban and rural populations. After a rigorous, albeit expensive, label cleaning process, these disparities between groups disappear. Our findings highlight how trade-offs between label quality and data annotation costs can complicate algorithmic audits in practice. They also emphasize the need for development of consensus-driven, widely accepted benchmarks for label quality.

View on arXiv PDF

Similar