LGMay 20, 2024

DispaRisk: Auditing Fairness Through Usable Information

arXiv:2405.12372v3h-index: 4ECML/PKDD
Originality Incremental advance
AI Analysis

This work addresses fairness issues in ML for sectors like healthcare and finance, though it appears incremental as it builds on existing usable information theory for early bias detection.

The authors tackled the problem of machine learning algorithms exacerbating societal biases by introducing DispaRisk, a framework that proactively assesses disparity risks in datasets early in the ML pipeline, demonstrating its effectiveness in identifying high-risk datasets and bias-prone model families.

Machine Learning algorithms (ML) impact virtually every aspect of human lives and have found use across diverse sectors including healthcare, finance, and education. Often, ML algorithms have been found to exacerbate societal biases present in datasets leading to adversarial impacts on subsets/groups of individuals and in many cases on minority groups. To effectively mitigate these untoward effects, it is crucial that disparities/biases are identified early in a ML pipeline. This proactive approach facilitates timely interventions to prevent bias amplification and reduce complexity at later stages of model development. In this paper, we leverage recent advancements in usable information theory to introduce DispaRisk, a novel framework designed to proactively assess the potential risks of disparities in datasets during the initial stages of the ML pipeline. We evaluate DispaRisk's effectiveness by benchmarking it against commonly used datasets in fairness research. Our findings demonstrate DispaRisk's capabilities to identify datasets with a high risk of discrimination, detect model families prone to biases within an ML pipeline, and enhance the explainability of these bias risks. This work contributes to the development of fairer ML systems by providing a robust tool for early bias detection and mitigation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes