LGOct 28, 2023

Inverse distance weighting attention

arXiv:2310.18805v24 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This work addresses interpretability for researchers and practitioners in machine learning, but it is incremental as it modifies an existing attention mechanism without demonstrating broad performance gains.

The paper tackles the problem of interpretability in neural networks by replacing scaled dot-product attention with inverse distance weighting, resulting in networks that produce prototype-based key matrices and corresponding logit value matrices, with the ability to manually add prototypes for special cases.

We report the effects of replacing the scaled dot-product (within softmax) attention with the negative-log of Euclidean distance. This form of attention simplifies to inverse distance weighting interpolation. Used in simple one hidden layer networks and trained with vanilla cross-entropy loss on classification problems, it tends to produce a key matrix containing prototypes and a value matrix with corresponding logits. We also show that the resulting interpretable networks can be augmented with manually-constructed prototypes to perform low-impact handling of special cases.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes