LG MLMay 3, 2019

Convolution, attention and structure embedding

arXiv:1905.01289v59.927 citations

Originality Synthesis-oriented

AI Analysis

This provides a theoretical foundation for analyzing and enriching diverse models in machine learning, but it is incremental as it unifies existing concepts rather than introducing a new paradigm.

The paper tackles the problem of controlling parameter explosion in linear operations for structured data embeddings in deep neural networks by presenting a unified framework that captures the essence of convolution models across various structures and shows that attention models fit as adaptive convolution.

Deep neural networks are composed of layers of parametrised linear operations intertwined with non linear activations. In basic models, such as the multi-layer perceptron, a linear layer operates on a simple input vector embedding of the instance being processed, and produces an output vector embedding by straight multiplication by a matrix parameter. In more complex models, the input and output are structured and their embeddings are higher order tensors. The parameter of each linear operation must then be controlled so as not to explode with the complexity of the structures involved. This is essentially the role of convolution models, which exist in many flavours dependent on the type of structure they deal with (grids, networks, time series etc.). We present here a unified framework which aims at capturing the essence of these diverse models, allowing a systematic analysis of their properties and their mutual enrichment. We also show that attention models naturally fit in the same framework: attention is convolution in which the structure itself is adaptive, and learnt, instead of being given a priori.

View on arXiv PDF

Similar