Assessing the Fairness of AI Systems: AI Practitioners' Processes, Challenges, and Needs for Support
This work addresses fairness issues in AI for practitioners, but it is incremental as it builds on prior research to identify specific implementation challenges.
The study investigated gaps in fairness practices for AI systems, specifically disaggregated evaluations, by interviewing 33 practitioners from three tech companies, finding challenges in metrics, stakeholder identification, and data collection.
Various tools and practices have been developed to support practitioners in identifying, assessing, and mitigating fairness-related harms caused by AI systems. However, prior research has highlighted gaps between the intended design of these tools and practices and their use within particular contexts, including gaps caused by the role that organizational factors play in shaping fairness work. In this paper, we investigate these gaps for one such practice: disaggregated evaluations of AI systems, intended to uncover performance disparities between demographic groups. By conducting semi-structured interviews and structured workshops with thirty-three AI practitioners from ten teams at three technology companies, we identify practitioners' processes, challenges, and needs for support when designing disaggregated evaluations. We find that practitioners face challenges when choosing performance metrics, identifying the most relevant direct stakeholders and demographic groups on which to focus, and collecting datasets with which to conduct disaggregated evaluations. More generally, we identify impacts on fairness work stemming from a lack of engagement with direct stakeholders or domain experts, business imperatives that prioritize customers over marginalized groups, and the drive to deploy AI systems at scale.