LGCYIRAug 17, 2021

Identifying Biased Subgroups in Ranking and Classification

arXiv:2108.07450v114 citations
Originality Incremental advance
AI Analysis

This addresses the need for automated bias detection in ML systems, reducing reliance on domain experts, but it is incremental as it builds on existing fairness and interpretability methods.

The paper tackles the problem of automatically identifying data subgroups where machine learning algorithms show biased performance, introducing a divergence measure to detect such subgroups in classification and ranking, and using Shapley values to quantify attribute contributions.

When analyzing the behavior of machine learning algorithms, it is important to identify specific data subgroups for which the considered algorithm shows different performance with respect to the entire dataset. The intervention of domain experts is normally required to identify relevant attributes that define these subgroups. We introduce the notion of divergence to measure this performance difference and we exploit it in the context of (i) classification models and (ii) ranking applications to automatically detect data subgroups showing a significant deviation in their behavior. Furthermore, we quantify the contribution of all attributes in the data subgroup to the divergent behavior by means of Shapley values, thus allowing the identification of the most impacting attributes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes