Counterfactual Situation Testing: From Single to Multidimensional Discrimination
This work addresses the problem of identifying discrimination in AI systems for legal and fairness applications, offering a novel method that improves detection over prior approaches.
The paper tackles the problem of detecting individual discrimination in classifier decisions by introducing counterfactual situation testing (CST), a causal data mining framework that extends situation testing with counterfactual reasoning to handle single and multidimensional discrimination. Experimental results show that CST uncovers a higher number of discrimination cases than existing methods, even when models are counterfactually fair, and it provides confidence intervals for all tests.
We present counterfactual situation testing (CST), a causal data mining framework for detecting individual discrimination in a dataset of classifier decisions. CST answers the question ``what would have been the model outcome had the individual, or complainant, been of a different protected status?'' It extends the legally-grounded situation testing (ST) of Thanh et al. (2011) by operationalizing the notion of "fairness given the difference" via counterfactual reasoning. ST finds for each complainant similar protected and non-protected instances in the dataset; constructs, respectively, a control and test group; and compares the groups such that a difference in model outcomes implies a potential case of individual discrimination. CST, instead, avoids this idealized comparison by establishing the test group on the complainant's generated counterfactual, which reflects how the protected attribute when changed influences other seemingly neutral attributes of the complainant. Under CST we test for discrimination for each complainant by comparing similar individuals within the control and test group but dissimilar individuals across these groups. We consider single (e.g.,~gender) and multidimensional (e.g.,~gender and race) discrimination testing. For multidimensional discrimination we study multiple and intersectional discrimination and, as feared by legal scholars, find evidence that the former fails to account for the latter kind. Using a k-nearest neighbor implementation, we showcase CST on synthetic and real data. Experimental results show that CST uncovers a higher number of cases than ST, even when the model is counterfactually fair. CST, in fact, extends counterfactual fairness (CF) of Kusner et al. (2017) by equipping CF with confidence intervals, which we report for all experiments.