Properties of Group Fairness Metrics for Rankings
This work addresses the challenge for practitioners in choosing appropriate fairness metrics in ranking applications, providing a framework for evaluation and selection, though it is incremental as it builds on existing metrics without introducing new ones.
The paper tackles the problem of selecting group fairness metrics for rankings by proposing an axiomatic approach with thirteen properties to evaluate existing metrics, finding that most metrics satisfy only a small subset, highlighting their limitations.
In recent years, several metrics have been developed for evaluating group fairness of rankings. Given that these metrics were developed with different application contexts and ranking algorithms in mind, it is not straightforward which metric to choose for a given scenario. In this paper, we perform a comprehensive comparative analysis of existing group fairness metrics developed in the context of fair ranking. By virtue of their diverse application contexts, we argue that such a comparative analysis is not straightforward. Hence, we take an axiomatic approach whereby we design a set of thirteen properties for group fairness metrics that consider different ranking settings. A metric can then be selected depending on whether it satisfies all or a subset of these properties. We apply these properties on eleven existing group fairness metrics, and through both empirical and theoretical results we demonstrate that most of these metrics only satisfy a small subset of the proposed properties. These findings highlight limitations of existing metrics, and provide insights into how to evaluate and interpret different fairness metrics in practical deployment. The proposed properties can also assist practitioners in selecting appropriate metrics for evaluating fairness in a specific application.