Diversity in Sociotechnical Machine Learning Systems
This work addresses the problem of vague and uninformative approaches to diversity in ML for researchers and practitioners, though it is incremental as it synthesizes existing knowledge rather than proposing new methods.
The paper tackles the gap between discussions of diversity's benefits and measures in machine learning and the underlying concepts and mechanisms, by introducing a taxonomy of diversity concepts and mechanisms from other fields to clarify discourse and improve research in sociotechnical ML systems.
There has been a surge of recent interest in sociocultural diversity in machine learning (ML) research, with researchers (i) examining the benefits of diversity as an organizational solution for alleviating problems with algorithmic bias, and (ii) proposing measures and methods for implementing diversity as a design desideratum in the construction of predictive algorithms. Currently, however, there is a gap between discussions of measures and benefits of diversity in ML, on the one hand, and the broader research on the underlying concepts of diversity and the precise mechanisms of its functional benefits, on the other. This gap is problematic because diversity is not a monolithic concept. Rather, different concepts of diversity are based on distinct rationales that should inform how we measure diversity in a given context. Similarly, the lack of specificity about the precise mechanisms underpinning diversity's potential benefits can result in uninformative generalities, invalid experimental designs, and illicit interpretations of findings. In this work, we draw on research in philosophy, psychology, and social and organizational sciences to make three contributions: First, we introduce a taxonomy of different diversity concepts from philosophy of science, and explicate the distinct epistemic and political rationales underlying these concepts. Second, we provide an overview of mechanisms by which diversity can benefit group performance. Third, we situate these taxonomies--of concepts and mechanisms--in the lifecycle of sociotechnical ML systems and make a case for their usefulness in fair and accountable ML. We do so by illustrating how they clarify the discourse around diversity in the context of ML systems, promote the formulation of more precise research questions about diversity's impact, and provide conceptual tools to further advance research and practice.