SEAICYDBLGSep 11, 2017

Fairness Testing: Testing Software for Discrimination

arXiv:1709.03221v1433 citations
Originality Highly original
AI Analysis

This addresses the critical issue of discrimination in software systems for developers and users in domains like criminal justice and finance, providing initial tools for fairness testing.

The paper tackles the problem of software discrimination by defining fairness and developing Themis, a testing-based method that automatically generates efficient test suites to measure discrimination, finding it effective in discovering discrimination and revealing that state-of-the-art techniques fail in many situations, with up to 98% discrimination in some subdomains.

This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes