A recurrent vision transformer shows signatures of primate visual attention
This work addresses the disconnect between biological and artificial intelligence research on attention, offering a model that captures key aspects of primate visual attention, though it is incremental as it builds on existing transformer and recurrent methods.
The authors tackled the problem of bridging animal attention and AI self-attention by proposing a Recurrent Vision Transformer that integrates self-attention with recurrent memory, trained on a primate-like task, resulting in improved accuracy and faster responses for cued stimuli that scale with cue validity, with analysis showing dynamic spatial prioritization and performance shifts similar to primate brain regions.
Attention is fundamental to both biological and artificial intelligence, yet research on animal attention and AI self attention remains largely disconnected. We propose a Recurrent Vision Transformer (Recurrent ViT) that integrates self-attention with recurrent memory, allowing both current inputs and stored information to guide attention allocation. Trained solely via sparse reward feedback on a spatially cued orientation change detection task, a paradigm used in primate studies, our model exhibits primate like signatures of attention, including improved accuracy and faster responses for cued stimuli that scale with cue validity. Analysis of self-attention maps reveals dynamic spatial prioritization with reactivation prior to expected changes, and targeted perturbations produce performance shifts similar to those observed in primate frontal eye fields and superior colliculus. These findings demonstrate that incorporating recurrent feedback into self attention can capture key aspects of primate visual attention.