LG DIS-NN CV NC MLFeb 14, 2023

Energy Transformer

Benjamin Hoover, Yuchen Liang, Bao Pham, Rameswar Panda, Hendrik Strobelt, Duen Horng Chau, Mohammed J. Zaki, Dmitry Krotov

Georgia TechIBM

arXiv:2302.07253v229.790 citationsh-index: 72Has Code

Originality Incremental advance

AI Analysis

This work addresses theoretical gaps in attention-based models for machine learning practitioners, though it appears incremental as it combines existing paradigms.

The paper tackles the lack of theoretical foundations in attention mechanisms and the design challenges in energy-based models by proposing the Energy Transformer, which integrates attention layers with a specifically engineered energy function, achieving strong quantitative results on tasks like graph anomaly detection and classification.

Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not straightforward. At the same time, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, and allow an intuitive design of the energy function. We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. In this work, we introduce the theoretical foundations of ET, explore its empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection and graph classification tasks.

View on arXiv PDF Code

Similar