CY LG MEDec 30, 2025

Statistical Guarantees in the Search for Less Discriminatory Algorithms

Chris Hays, Ben Laufer, Solon Barocas, Manish Raghavan

arXiv:2512.23943v11.2h-index: 37

Originality Incremental advance

AI Analysis

This addresses the need for firms in high-stakes domains like credit and employment to certify their good-faith efforts in reducing algorithmic discrimination, though it is incremental as it builds on existing model multiplicity concepts.

The paper tackles the problem of determining when a firm has made a sufficient effort to find less discriminatory algorithms (LDAs) by formalizing the search as an optimal stopping problem, and it provides an adaptive stopping algorithm that yields a high-probability upper bound on potential gains from continued search, validated on real-world datasets.

Recent scholarship has argued that firms building data-driven decision systems in high-stakes domains like employment, credit, and housing should search for "less discriminatory algorithms" (LDAs) (Black et al., 2024). That is, for a given decision problem, firms considering deploying a model should make a good-faith effort to find equally performant models with lower disparate impact across social groups. Evidence from the literature on model multiplicity shows that randomness in training pipelines can lead to multiple models with the same performance, but meaningful variations in disparate impact. This suggests that developers can find LDAs simply by randomly retraining models. Firms cannot continue retraining forever, though, which raises the question: What constitutes a good-faith effort? In this paper, we formalize LDA search via model multiplicity as an optimal stopping problem, where a model developer with limited information wants to produce strong evidence that they have sufficiently explored the space of models. Our primary contribution is an adaptive stopping algorithm that yields a high-probability upper bound on the gains achievable from a continued search, allowing the developer to certify (e.g., to a court) that their search was sufficient. We provide a framework under which developers can impose stronger assumptions about the distribution of models, yielding correspondingly stronger bounds. We validate the method on real-world credit, employment and housing datasets.

View on arXiv PDF

Similar