LGCYSTMar 20, 2024

Optimal Transport for Fairness: Archival Data Repair using Small Research Data Sets

arXiv:2403.13864v15 citationsh-index: 22024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)
Originality Incremental advance
AI Analysis

This addresses the need for efficient fairness repair algorithms under regulations like the AI Act, though it is incremental as it builds on existing optimal transport methods.

The paper tackles the problem of repairing unfairness in large archival training data using only a small labeled subset, by applying optimal transport-based repair plans to achieve conditional independence between protected and unprotected attributes, with experimental results showing effective repair on benchmark datasets like Adult.

With the advent of the AI Act and other regulations, there is now an urgent need for algorithms that repair unfairness in training data. In this paper, we define fairness in terms of conditional independence between protected attributes ($S$) and features ($X$), given unprotected attributes ($U$). We address the important setting in which torrents of archival data need to be repaired, using only a small proportion of these data, which are $S|U$-labelled (the research data). We use the latter to design optimal transport (OT)-based repair plans on interpolated supports. This allows {\em off-sample}, labelled, archival data to be repaired, subject to stationarity assumptions. It also significantly reduces the size of the supports of the OT plans, with correspondingly large savings in the cost of their design and of their {\em sequential\/} application to the off-sample data. We provide detailed experimental results with simulated and benchmark real data (the Adult data set). Our performance figures demonstrate effective repair -- in the sense of quenching conditional dependence -- of large quantities of off-sample, labelled (archival) data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes