Towards Auditability for Fairness in Deep Learning
This work addresses the problem of auditing individual predictions for fairness in deep learning models, which is crucial for practitioners deploying these models in sensitive applications.
This paper introduces smooth prediction sensitivity, a new measure of individual fairness for deep learning models. It aims to identify blatantly unfair predictions even in models that appear fair by group metrics, with preliminary results suggesting its effectiveness in distinguishing fair from unfair predictions.
Group fairness metrics can detect when a deep learning model behaves differently for advantaged and disadvantaged groups, but even models that score well on these metrics can make blatantly unfair predictions. We present smooth prediction sensitivity, an efficiently computed measure of individual fairness for deep learning models that is inspired by ideas from interpretability in deep learning. smooth prediction sensitivity allows individual predictions to be audited for fairness. We present preliminary experimental results suggesting that smooth prediction sensitivity can help distinguish between fair and unfair predictions, and that it may be helpful in detecting blatantly unfair predictions from "group-fair" models.