Relational inductive biases on attention mechanisms
This work provides a theoretical framework for understanding how attention mechanisms generalize, which is incremental for researchers in geometric deep learning.
The paper analyzes the relational inductive biases in attention mechanisms by examining their equivariance properties under permutation subgroups, leading to a classification based on these biases.
Inductive learning aims to construct general models from specific examples, guided by biases that influence hypothesis selection and determine generalization capacity. In this work, we focus on characterizing the relational inductive biases present in attention mechanisms, understood as assumptions about the underlying relationships between data elements. From the perspective of geometric deep learning, we analyze the most common attention mechanisms in terms of their equivariance properties with respect to permutation subgroups, which allows us to propose a classification based on their relational biases. Under this perspective, we show that different attention layers are characterized by the underlying relationships they assume on the input data.