Trading-off price for data quality to achieve fair online allocation
This work addresses fairness in online allocation for decision-makers who cannot directly observe protected attributes, offering a practical solution by trading off cost for data quality, though it is incremental in combining bandit and allocation problems.
The paper tackles the problem of online allocation with long-term fairness constraints when protected attributes are unobserved, by allowing the purchase of data from varying quality sources to estimate these attributes. The authors propose an algorithm that jointly selects data sources and performs allocations, achieving a regret bound of O(√T) and adapting to different fairness notions.
We consider the problem of online allocation subject to a long-term fairness penalty. Contrary to existing works, however, we do not assume that the decision-maker observes the protected attributes -- which is often unrealistic in practice. Instead they can purchase data that help estimate them from sources of different quality; and hence reduce the fairness penalty at some cost. We model this problem as a multi-armed bandit problem where each arm corresponds to the choice of a data source, coupled with the online allocation problem. We propose an algorithm that jointly solves both problems and show that it has a regret bounded by $\mathcal{O}(\sqrt{T})$. A key difficulty is that the rewards received by selecting a source are correlated by the fairness penalty, which leads to a need for randomization (despite a stochastic setting). Our algorithm takes into account contextual information available before the source selection, and can adapt to many different fairness notions. We also show that in some instances, the estimates used can be learned on the fly.