LGAIMASep 30, 2025

In-Context Curiosity: Distilling Exploration for Decision-Pretrained Transformers on Bandit Tasks

arXiv:2510.00347v1
Originality Incremental advance
AI Analysis

This addresses generalization issues for in-context reinforcement learning agents, but the results are preliminary and incremental.

The paper tackled the problem of Decision-Pretrained Transformers (DPTs) struggling to generalize beyond their pretraining data distribution in decision-making tasks, and proposed the Prediction-Powered Transformer (PPT) framework with in-context curiosity, showing improved robustness in Gaussian multi-armed bandit experiments by moderating performance degradation in higher-variance test environments.

As large language models (LLMs) continue to grow in capability, there is increasing interest in incorporating them into decision-making tasks. A common pipeline for this is Decision-Pretrained Transformers (DPTs). However, existing training methods for DPTs often struggle to generalize beyond their pretraining data distribution. To explore mitigation of this limitation, we propose in-context curiosity -- a lightweight, exploration-inspired regularizer for offline pretraining -- and introduce the Prediction-Powered Transformer (PPT) framework. PPT augments DPT with an auxiliary reward predictor, using prediction error as an intrinsic curiosity signal to encourage broader exploration during training. In proof-of-concept experiments on Gaussian multi-armed bandits, PPT shows improved robustness: it moderates the performance degradation observed in DPT when test environments exhibit higher variance in reward, particularly when pretraining data has limited diversity. While the quality of offline data remain fundamental, our preliminary results suggest that curiosity-driven pretraining offers a promising direction for enhancing out-of-distribution generalization in in-context RL agents.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes