AI LGMar 9

Learning When to Trust in Contextual Bandits

arXiv:2603.13356h-index: 2

AI Analysis

This addresses a subtle failure mode in robust reinforcement learning for systems relying on evaluator feedback, offering a novel solution to improve reliability in critical contexts.

The paper tackles the problem of evaluators being truthful in benign contexts but strategically biased in critical ones, termed Contextual Sycophancy, and shows that standard robust methods fail in this setting. It proposes CESA-LinUCB, which learns a high-dimensional trust boundary for each evaluator and achieves sublinear regret of $ ilde{O}(\sqrt{T})$ against contextual adversaries, recovering ground truth even without globally reliable evaluators.

Standard approaches to Robust Reinforcement Learning assume that feedback sources are either globally trustworthy or globally adversarial. In this paper, we challenge this assumption and we identify a more subtle failure mode. We term this mode as Contextual Sycophancy, where evaluators are truthful in benign contexts but strategically biased in critical ones. We prove that standard robust methods fail in this setting, suffering from Contextual Objective Decoupling. To address this, we propose CESA-LinUCB, which learns a high-dimensional Trust Boundary for each evaluator. We prove that CESA-LinUCB achieves sublinear regret $\tilde{O}(\sqrt{T})$ against contextual adversaries, recovering the ground truth even when no evaluator is globally reliable.

View on arXiv PDF

Similar