Fairness of Classifiers in the Presence of Constraints between Features
For ML practitioners and regulators, this work provides a formal framework to detect hidden biases due to feature dependencies, though the analysis is theoretical without empirical validation.
The paper proposes a definition of fairness based on prime-implicant explanations that account for feature constraints, showing that ignoring constraints can alter fairness assessments. It identifies three classifier-level fairness definitions and analyzes their relationships and computational complexity.
In Machine Learning, an accepted definition of fairness of a decision taken by a classifier is that it should not depend on protected features, such as gender. Unfortunately, when constraints exist between features, such dependencies can be obscured by the constraints. To avoid this problem, we propose that a decision be considered fair if it has a fair explanation. We define a fair explanation as a prime-implicant reason for the decision that does not contain any protected feature (where the constraints are taken into account in the definition of prime-implicant). Surprisingly, ignoring constraints can completely change the fairness of a decision (according to this definition) even in the absence of constraints between protected and unprotected features. Three possible definitions of fairness of a classifier are that for all its decisions (1) there are only fair explanations, (2) there is at least one fair explanation, or (3) changing protected features does not change the outcome. We identify the relationships between these different definitions of fairness and study the computational complexity of testing fairness of classifiers.