LGSep 11, 2024

Unveiling Markov Heads in Pretrained Language Models for Offline Reinforcement Learning

arXiv:2409.06985v22 citationsh-index: 6
AI Analysis

This addresses a bottleneck in using pretrained language models for offline RL by improving adaptability to long-term tasks, though it is incremental as it builds on existing decision transformer methods.

The paper identifies a 'Markov head' in pretrained language models that causes extreme attention on the last token, limiting performance to short-term environments in offline reinforcement learning, and proposes GPT2-DTMA with Mixture of Attention to improve long-term performance, achieving comparable results in short-term and significantly narrowing the gap in long-term environments.

Recently, incorporating knowledge from pretrained language models (PLMs) into decision transformers (DTs) has generated significant attention in offline reinforcement learning (RL). These PLMs perform well in RL tasks, raising an intriguing question: what kind of knowledge from PLMs has been transferred to RL to achieve such good results? This work first dives into this problem by analyzing each head quantitatively and points out Markov head, a crucial component that exists in the attention heads of PLMs. It leads to extreme attention on the last-input token and performs well only in short-term environments. Furthermore, we prove that this extreme attention cannot be changed by re-training embedding layer or fine-tuning. Inspired by our analysis, we propose a general method GPT2-DTMA, which equips a pretrained DT with Mixture of Attention (MoA), to accommodate diverse attention requirements during fine-tuning. Extensive experiments corroborate our theorems and demonstrate the effectiveness of GPT2-DTMA: it achieves comparable performance in short-term environments while significantly narrowing the performance gap in long-term environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes