Optimal Fair Aggregation of Crowdsourced Noisy Labels using Demographic Parity Constraints
This addresses fairness concerns in crowdsourced label aggregation, which is crucial for applications relying on human annotations, but it is incremental as it builds on existing fairness frameworks and post-processing methods.
The paper tackles the problem of fairness in aggregating noisy crowdsourced labels by analyzing fairness gaps for Majority Vote and Optimal Bayesian aggregation, showing exponential convergence to ground-truth fairness under certain conditions, and generalizing a multiclass fairness post-processing algorithm to enforce demographic parity constraints, with experiments validating the approach on synthetic and real datasets.
As acquiring reliable ground-truth labels is usually costly, or infeasible, crowdsourcing and aggregation of noisy human annotations is the typical resort. Aggregating subjective labels, though, may amplify individual biases, particularly regarding sensitive features, raising fairness concerns. Nonetheless, fairness in crowdsourced aggregation remains largely unexplored, with no existing convergence guarantees and only limited post-processing approaches for enforcing $\varepsilon$-fairness under demographic parity. We address this gap by analyzing the fairness s of crowdsourced aggregation methods within the $\varepsilon$-fairness framework, for Majority Vote and Optimal Bayesian aggregation. In the small-crowd regime, we derive an upper bound on the fairness gap of Majority Vote in terms of the fairness gaps of the individual annotators. We further show that the fairness gap of the aggregated consensus converges exponentially fast to that of the ground-truth under interpretable conditions. Since ground-truth itself may still be unfair, we generalize a state-of-the-art multiclass fairness post-processing algorithm from the continuous to the discrete setting, which enforces strict demographic parity constraints to any aggregation rule. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and corroborate the theoretical insights.