LGAINov 24, 2025

On the Role of Hidden States of Modern Hopfield Network in Transformer

arXiv:2511.20698v11 citations
Originality Incremental advance
AI Analysis

This work addresses architectural limitations in Transformers for deep learning applications, offering an incremental improvement based on associative memory theory.

The paper tackles the problem of rank collapse and token uniformity in deep Transformers by introducing modern Hopfield attention (MHA), which incorporates hidden states from modern Hopfield networks into self-attention, showing improved accuracy without extra parameters in Vision Transformer and GPT.

Associative memory models based on Hopfield networks and self-attention based on key-value mechanisms have been popular approaches in the study of memory mechanisms in deep learning. It has been pointed out that the state update rule of the modern Hopfield network (MHN) in the adiabatic approximation is in agreement with the self-attention layer of Transformer. In this paper, we go beyond this approximation and investigate the relationship between MHN and self-attention. Our results show that the correspondence between Hopfield networks and Transformers can be established in a more generalized form by adding a new variable, the hidden state derived from the MHN, to self-attention. This new attention mechanism, modern Hopfield attention (MHA), allows the inheritance of attention scores from the input layer of the Transformer to the output layer, which greatly improves the nature of attention weights. In particular, we show both theoretically and empirically that MHA hidden states significantly improve serious problem of deep Transformers known as rank collapse and token uniformity. We also confirm that MHA can systematically improve accuracy without adding training parameters to the Vision Transformer or GPT. Our results provide a new case in which Hopfield networks can be a useful perspective for improving the Transformer architecture.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes