LGEMMLOct 22, 2025

Policy Learning with Abstention

Stanford
arXiv:2510.19672v2h-index: 39
Originality Incremental advance
AI Analysis

This work addresses the risk of uncertain decisions in high-stakes applications like personalized medicine and advertising by allowing policies to abstain, offering incremental improvements in policy learning methods.

The authors tackled the problem of policy learning algorithms making decisions under uncertainty by introducing abstention, where a policy can defer to a safe default or expert, and developed a two-stage learner with fast regret guarantees. They demonstrated that abstention improves performance under margin conditions, connects to distributionally robust learning, and supports safe policy improvement with high probability.

Policy learning algorithms are widely used in areas such as personalized medicine and advertising to develop individualized treatment regimes. However, most methods force a decision even when predictions are uncertain, which is risky in high-stakes settings. We study policy learning with abstention, where a policy may defer to a safe default or an expert. When a policy abstains, it receives a small additive reward on top of the value of a random guess. We propose a two-stage learner that first identifies a set of near-optimal policies and then constructs an abstention rule from their disagreements. We establish fast O(1/n)-type regret guarantees when propensities are known, and extend these guarantees to the unknown-propensity case via a doubly robust (DR) objective. We further show that abstention is a versatile tool with direct applications to other core problems in policy learning: it yields improved guarantees under margin conditions without the common realizability assumption, connects to distributionally robust policy learning by hedging against small data shifts, and supports safe policy improvement by ensuring improvement over a baseline policy with high probability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes