LGAIJan 21, 2022

Meta Learning MDPs with Linear Transition Models

arXiv:2201.08732v14 citations
Originality Incremental advance
AI Analysis

This work addresses meta-learning for MDPs with linear models, offering incremental improvements in regret for specific task distributions.

The paper tackles meta-learning in Markov Decision Processes with linear transition models by proposing BUC-MatrixRL, which leverages training tasks to quickly solve test tasks from the same distribution, showing significant improvements in transfer regret for high bias low variance distributions.

We study meta-learning in Markov Decision Processes (MDP) with linear transition models in the undiscounted episodic setting. Under a task sharedness metric based on model proximity we study task families characterized by a distribution over models specified by a bias term and a variance component. We then propose BUC-MatrixRL, a version of the UC-Matrix RL algorithm, and show it can meaningfully leverage a set of sampled training tasks to quickly solve a test task sampled from the same task distribution by learning an estimator of the bias parameter of the task distribution. The analysis leverages and extends results in the learning to learn linear regression and linear bandit setting to the more general case of MDP's with linear transition models. We prove that compared to learning the tasks in isolation, BUC-Matrix RL provides significant improvements in the transfer regret for high bias low variance task distributions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes