Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models
This addresses fairness issues in high-stakes domains, but it is incremental as it builds on existing interpretability techniques.
The paper tackles the problem of unintentional discrimination in black-box predictive models used for services like credit and employment by introducing an iterative orthogonal projection method to quantify input attribute dependence, enabling fairness assessment.
Predictive models are increasingly deployed for the purpose of determining access to services such as credit, insurance, and employment. Despite potential gains in productivity and efficiency, several potential problems have yet to be addressed, particularly the potential for unintentional discrimination. We present an iterative procedure, based on orthogonal projection of input attributes, for enabling interpretability of black-box predictive models. Through our iterative procedure, one can quantify the relative dependence of a black-box model on its input attributes.The relative significance of the inputs to a predictive model can then be used to assess the fairness (or discriminatory extent) of such a model.