LGAICYMLJun 2, 2021

Addressing the Long-term Impact of ML Decisions via Policy Regret

arXiv:2106.01325v19 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of fair and effective opportunity allocation in domains like lending and education, focusing on long-term community impacts, though it builds incrementally on prior bandit models.

The paper tackles the problem of long-term impacts of ML allocation decisions by modeling communities as arms with evolving rewards and introducing policy regret as a stronger metric than external regret. It presents an algorithm with provably sub-linear policy regret and shows it outperforms baselines empirically, especially for long time horizons.

Machine Learning (ML) increasingly informs the allocation of opportunities to individuals and communities in areas such as lending, education, employment, and beyond. Such decisions often impact their subjects' future characteristics and capabilities in an a priori unknown fashion. The decision-maker, therefore, faces exploration-exploitation dilemmas akin to those in multi-armed bandits. Following prior work, we model communities as arms. To capture the long-term effects of ML-based allocation decisions, we study a setting in which the reward from each arm evolves every time the decision-maker pulls that arm. We focus on reward functions that are initially increasing in the number of pulls but may become (and remain) decreasing after a certain point. We argue that an acceptable sequential allocation of opportunities must take an arm's potential for growth into account. We capture these considerations through the notion of policy regret, a much stronger notion than the often-studied external regret, and present an algorithm with provably sub-linear policy regret for sufficiently long time horizons. We empirically compare our algorithm with several baselines and find that it consistently outperforms them, in particular for long time horizons.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes