LGAINov 13, 2025

Heuristic Transformer: Belief Augmented In-Context Reinforcement Learning

arXiv:2511.10251v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing decision-making in reinforcement learning for researchers and practitioners, though it is incremental as it builds on prior transformer-based methods.

The paper tackles the problem of improving in-context reinforcement learning by augmenting the in-context dataset with a belief distribution over rewards, resulting in consistent performance gains across multiple environments like Darkroom, Miniworld, and MuJoCo compared to baselines.

Transformers have demonstrated exceptional in-context learning (ICL) capabilities, enabling applications across natural language processing, computer vision, and sequential decision-making. In reinforcement learning, ICL reframes learning as a supervised problem, facilitating task adaptation without parameter updates. Building on prior work leveraging transformers for sequential decision-making, we propose Heuristic Transformer (HT), an in-context reinforcement learning (ICRL) approach that augments the in-context dataset with a belief distribution over rewards to achieve better decision-making. Using a variational auto-encoder (VAE), a low-dimensional stochastic variable is learned to represent the posterior distribution over rewards, which is incorporated alongside an in-context dataset and query states as prompt to the transformer policy. We assess the performance of HT across the Darkroom, Miniworld, and MuJoCo environments, showing that it consistently surpasses comparable baselines in terms of both effectiveness and generalization. Our method presents a promising direction to bridge the gap between belief-based augmentations and transformer-based decision-making.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes