LGAIROMar 2, 2021

Minimax Model Learning

arXiv:2103.02084v119 citations
AI Analysis

This addresses robustness issues in model-based reinforcement learning for scenarios with model misspecification or distribution shift, though it appears incremental as it builds on prior off-policy evaluation techniques.

The paper tackles the problem of distribution shift in model-based reinforcement learning by introducing a novel off-policy loss function derived from the off-policy policy evaluation objective, resulting in empirical improvements over existing methods.

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes