LG AISep 13, 2024

Transformers from Diffusion: A Unified Framework for Neural Message Passing

arXiv:2409.09111v46.42 citationsh-index: 20

Originality Incremental advance

AI Analysis

This work addresses a fundamental problem in machine learning for handling structured data with observed or unobserved geometries, offering a unified framework that is incremental in nature.

The paper tackles the challenge of learning representations for structured data with geometries by proposing an energy-constrained diffusion model that unifies neural message passing architectures like MLPs, GNNs, and Transformers, resulting in a new model called DIFFormer that achieves promising performance across diverse datasets including real-world networks, images, texts, and physical particles.

Learning representations for structured data with certain geometries (e.g., observed or unobserved) is a fundamental challenge, wherein message passing neural networks (MPNNs) have become a de facto class of model solutions. In this paper, inspired by physical systems, we propose an energy-constrained diffusion model, which integrates the inductive bias of diffusion on manifolds with layer-wise constraints of energy minimization. We identify that the diffusion operators have a one-to-one correspondence with the energy functions implicitly descended by the diffusion process, and the finite-difference iteration for solving the energy-constrained diffusion system induces the propagation layers of various types of MPNNs operating on observed or latent structures. This leads to a unified mathematical framework for common neural architectures whose computational flows can be cast as message passing (or its special case), including MLPs, GNNs, and Transformers. Building on these insights, we devise a new class of neural message passing models, dubbed diffusion-inspired Transformers (DIFFormer), whose global attention layers are derived from the principled energy-constrained diffusion framework. Across diverse datasets ranging from real-world networks to images, texts, and physical particles, we demonstrate that the new model achieves promising performance in scenarios where the data structures are observed (as a graph), partially observed, or entirely unobserved.

View on arXiv PDF

Similar