LGAIGTMAOct 7, 2021

Online Markov Decision Processes with Non-oblivious Strategic Adversary

arXiv:2110.03604v37 citations
Originality Incremental advance
AI Analysis

This work addresses strategic decision-making in online environments for applications like game theory and AI, with incremental improvements in regret bounds and convergence results.

The paper tackles the problem of Online Markov Decision Processes with a non-oblivious strategic adversary, showing that an existing algorithm achieves a policy regret bound of O(√(T log(L)) + τ²√(T log(|A|))) and proposing a new algorithm, MDP-OOE, that improves this to O(√(T log(L)) + τ²√(T k log(k))) for games with small Nash equilibrium support, while also introducing an algorithm for last-round convergence to Nash equilibrium.

We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of $\mathcal{O}(\sqrt{T \log(L)}+τ^2\sqrt{ T \log(|A|)})$ where $L$ is the size of adversary's pure strategy set and $|A|$ denotes the size of agent's action space. Considering real-world games where the support size of a NE is small, we further propose a new algorithm: MDP-Online Oracle Expert (MDP-OOE), that achieves a policy regret bound of $\mathcal{O}(\sqrt{T\log(L)}+τ^2\sqrt{ T k \log(k)})$ where $k$ depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can solve games with prohibitively large action space. Finally, to better understand the learning dynamics of no-regret methods, under the same setting of no-external regret adversary in OMDPs, we introduce an algorithm that achieves last-round convergence result to a NE. To our best knowledge, this is first work leading to the last iteration result in OMDPs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes