LGAIFeb 1

TQL: Scaling Q-Functions with Transformers by Preventing Attention Collapse

arXiv:2602.01439v15 citations
Originality Highly original
AI Analysis

This addresses a key bottleneck in reinforcement learning by enabling stable scaling of value functions with transformers, which is incremental but important for advancing RL methods.

The paper tackled the problem of scaling value functions in reinforcement learning with transformers, which often leads to instability and worse performance, by identifying attention collapse as the critical failure mode and proposing TQL to prevent it through entropy control, resulting in up to a 43% performance improvement when scaling network sizes.

Despite scale driving substantial recent advancements in machine learning, reinforcement learning (RL) methods still primarily use small value functions. Naively scaling value functions -- including with a transformer architecture, which is known to be highly scalable -- often results in learning instability and worse performance. In this work, we ask what prevents transformers from scaling effectively for value functions? Through empirical analysis, we identify the critical failure mode in this scaling: attention scores collapse as capacity increases. Our key insight is that we can effectively prevent this collapse and stabilize training by controlling the entropy of the attention scores, thereby enabling the use of larger models. To this end, we propose Transformer Q-Learning (TQL), a method that unlocks the scaling potential of transformers in learning value functions in RL. Our approach yields up to a 43% improvement in performance when scaling from the smallest to the largest network sizes, while prior methods suffer from performance degradation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes