LGAIFeb 26, 2025

Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning

Georgia Tech
arXiv:2502.19009v110 citationsh-index: 7ICLR
Originality Highly original
AI Analysis

This addresses sample efficiency and suboptimal behavior in in-context RL for AI systems, though it is incremental as it builds on existing Transformer-based RL methods.

The paper tackled the problem of Transformers inheriting suboptimal behaviors from imitated RL algorithms in in-context learning by proposing DICP, a framework that integrates model-based planning to improve policies without separate dynamics models, achieving state-of-the-art performance with fewer environment interactions.

Recent studies have shown that Transformers can perform in-context reinforcement learning (RL) by imitating existing RL algorithms, enabling sample-efficient adaptation to unseen tasks without parameter updates. However, these models also inherit the suboptimal behaviors of the RL algorithms they imitate. This issue primarily arises due to the gradual update rule employed by those algorithms. Model-based planning offers a promising solution to this limitation by allowing the models to simulate potential outcomes before taking action, providing an additional mechanism to deviate from the suboptimal behavior. Rather than learning a separate dynamics model, we propose Distillation for In-Context Planning (DICP), an in-context model-based RL framework where Transformers simultaneously learn environment dynamics and improve policy in-context. We evaluate DICP across a range of discrete and continuous environments, including Darkroom variants and Meta-World. Our results show that DICP achieves state-of-the-art performance while requiring significantly fewer environment interactions than baselines, which include both model-free counterparts and existing meta-RL methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes