LGPFJul 20, 2023

Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions

arXiv:2307.10524v212 citationsh-index: 56
Originality Incremental advance
AI Analysis

This work addresses the challenge of integrating machine-learned advice into decision-making processes for researchers and practitioners in reinforcement learning, offering a novel theoretical framework that is incremental over prior black-box methods.

The paper tackles the problem of balancing consistency and robustness in time-varying Markov Decision Processes (MDPs) with untrusted machine-learned advice, by utilizing Q-value predictions to dynamically choose between advice and a robust baseline, resulting in near-optimal performance guarantees that improve upon black-box advice approaches.

We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes