CVAILGMay 28, 2023

Key-Value Transformer

arXiv:2305.19129v11 citations
Originality Incremental advance
AI Analysis

This work addresses the efficiency and design of transformers for AI practitioners, but it is incremental as it builds on existing QKV formulations without conclusive superiority.

The paper tackled the problem of evaluating the essentiality of the Query, Key, and Value (QKV) components in transformers by testing a Key-Value (KV) formulation with symmetric attention maps and an asymmetric version with 2D positional encoding, finding that it requires fewer parameters and computation and occasionally outperforms QKV transformers in tasks like list reversal, classification, and translation, but also underperforms in some cases.

Transformers have emerged as the prevailing standard solution for various AI tasks, including computer vision and natural language processing. The widely adopted Query, Key, and Value formulation (QKV) has played a significant role in this. Nevertheless, no research has examined the essentiality of these three components for transformer performance. Therefore, we conducted an evaluation of the key-value formulation (KV), which generates symmetric attention maps, along with an asymmetric version that incorporates a 2D positional encoding into the attention matrix. Remarkably, this transformer requires fewer parameters and computation than the original one. Through experiments encompassing three task types -- synthetics (such as reversing or sorting a list), vision (mnist or cifar classification), and NLP (character generation and translation) -- we discovered that the KV transformer occasionally outperforms the QKV transformer. However, it also exhibits instances of underperformance compared to QKV, making it challenging to draw a definitive conclusion. Nonetheless, we consider the reported results to be encouraging and anticipate that they may pave the way for more efficient transformers in the future.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes