Fairness Violations and Mitigation under Covariate Shift
This addresses fairness violations in machine learning models for unseen test sets, particularly in domains like healthcare, but is incremental as it builds on existing domain adaptation and fairness literature.
The paper tackles the problem of learning fair prediction models that remain stable under covariate shift, where test data distributions differ from training data, by identifying conditions and using causal graphs for feature selection to estimate accuracy and fairness metrics, showing worst-case optimality for specific fairness definitions and illustrating advantages in a healthcare task.
We study the problem of learning fair prediction models for unseen test sets distributed differently from the train set. Stability against changes in data distribution is an important mandate for responsible deployment of models. The domain adaptation literature addresses this concern, albeit with the notion of stability limited to that of prediction accuracy. We identify sufficient conditions under which stable models, both in terms of prediction accuracy and fairness, can be learned. Using the causal graph describing the data and the anticipated shifts, we specify an approach based on feature selection that exploits conditional independencies in the data to estimate accuracy and fairness metrics for the test set. We show that for specific fairness definitions, the resulting model satisfies a form of worst-case optimality. In context of a healthcare task, we illustrate the advantages of the approach in making more equitable decisions.