Compliant But Unsatisfactory: The Gap Between Auditing Standards and Practices for Probabilistic Genotyping Software
For AI governance and auditing communities, the paper highlights how vague requirements in audit standards can undermine their effectiveness, offering design recommendations to improve accountability.
The paper examines how poorly designed audit standards can mask inadequate AI systems, using the ASB 018 standard for probabilistic genotyping software as a case study. It identifies gaps between the standard's intended outcomes and actual audit practices, such as audits complying without establishing use restrictions based on failures.
AI governance efforts increasingly rely on audit standards: agreed-upon practices for conducting audits. However, poorly designed standards can hide and lend credibility to inadequate systems. We explore how an audit standard's design influences its effectiveness through a case study of ASB 018, a standard for auditing probabilistic genotyping software -- software that the U.S. criminal legal system increasingly uses to analyze DNA samples. Through qualitative analysis of ASB 018 and five audit reports, we identify numerous gaps between the standard's desired outcomes and the auditing practices it enables. For instance, ASB 018 envisions that compliant audits establish restrictions on software use based on observed failures. However, audits can comply without establishing such boundaries. We connect these gaps to the design of the standard's requirements such as vague language and undefined terms. We conclude with recommendations for designing audit standards and evaluating their effectiveness.