LGMLFeb 13, 2024

Approximation of relation functions and attention mechanisms

arXiv:2402.08856v28 citationsh-index: 5
AI Analysis

This work addresses the theoretical underpinnings of relation modeling in machine learning, with implications for attention mechanisms, but it is incremental as it builds on existing approximation theory.

The paper tackled the problem of approximating symmetric and asymmetric relation functions using inner products of neural networks, proving they are universal approximators and providing bounds on neuron counts for given accuracy. It applied these results to show that any retrieval mechanism defined by an abstract preorder can be approximated by attention mechanisms in Transformers.

Inner products of neural network feature maps arise in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the case of asymmetric relation functions, it is shown that the inner product of two different multi-layer perceptrons is a universal approximator. In both cases, a bound is obtained on the number of neurons required to achieve a given accuracy of approximation. In the symmetric case, the function class can be identified with kernels of reproducing kernel Hilbert spaces, whereas in the asymmetric case the function class can be identified with kernels of reproducing kernel Banach spaces. Finally, these approximation results are applied to analyzing the attention mechanism underlying Transformers, showing that any retrieval mechanism defined by an abstract preorder can be approximated by attention through its inner product relations. This result uses the Debreu representation theorem in economics to represent preference relations in terms of utility functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes