LGDIS-NNCVNCMLFeb 14, 2023

Energy Transformer

Georgia TechIBM
arXiv:2302.07253v287 citationsh-index: 72
AI Analysis

This work addresses theoretical gaps in attention-based models for machine learning practitioners, though it appears incremental as it combines existing paradigms.

The paper tackles the lack of theoretical foundations in attention mechanisms and the design challenges in energy-based models by proposing the Energy Transformer, which integrates attention layers with a specifically engineered energy function, achieving strong quantitative results on tasks like graph anomaly detection and classification.

Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not straightforward. At the same time, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, and allow an intuitive design of the energy function. We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. In this work, we introduce the theoretical foundations of ET, explore its empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection and graph classification tasks.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes