LGApr 9, 2025

Regret Bounds for Robust Online Decision Making

arXiv:2504.06820v24 citationsh-index: 3COLT

Originality Incremental advance

AI Analysis

This work addresses robust decision-making under uncertainty for applications like bandits and reinforcement learning, offering a more realistic framework but is incremental in extending existing theories.

The authors tackled the problem of robust online decision making by proposing a framework that generalizes structured observations with multivalued models, allowing adversarial nature to choose distributions from convex sets. They derived regret bounds that characterize power-law learnability, demonstrating improvements in robust linear bandits and tabular robust online reinforcement learning, though computational efficiency was not addressed.

We propose a framework which generalizes "decision making with structured observations" by allowing robust (i.e. multivalued) models. In this framework, each model associates each decision with a convex set of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be nonoblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework. Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory in two special cases: robust linear bandits and tabular robust online reinforcement learning. In both cases, we derive regret bounds that improve state-of-the-art (except that we do not address computational efficiency).

View on arXiv PDF

Similar