LGFeb 25, 2024

DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning

Anthony Liang, Guy Tennenholtz, Chih-wei Hsu, Yinlam Chow, Erdem Bıyık, Craig Boutilier

arXiv:2402.15957v29.24 citationsh-index: 28NIPS

Originality Incremental advance

AI Analysis

This addresses a specific challenge in meta-RL for dynamic environments, though it appears incremental as it builds on existing meta-RL methods with targeted modifications.

The paper tackles the problem of meta-reinforcement learning in environments with varying latent state evolution rates by introducing DynaMITE-RL, which models episode sessions and incorporates three modifications. The result shows that DynaMITE-RL significantly outperforms state-of-the-art baselines in sample efficiency and inference returns across multiple domains.

We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates. We model episode sessions - parts of the episode where the latent state is fixed - and propose three key modifications to existing meta-RL methods: consistency of latent information within sessions, session masking, and prior latent conditioning. We demonstrate the importance of these modifications in various domains, ranging from discrete Gridworld environments to continuous-control and simulated robot assistive tasks, demonstrating that DynaMITE-RL significantly outperforms state-of-the-art baselines in sample efficiency and inference returns.

View on arXiv PDF

Similar