CLAILGOct 27, 2023

Transformers as Graph-to-Graph Models

arXiv:2310.17936v1133 citationsh-index: 33Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of graph prediction in natural language processing, offering a novel integration method that is incremental in adapting existing Transformer frameworks.

The paper tackles the problem of modeling linguistic structures by proposing that Transformers are graph-to-graph models, and introduces a Graph-to-Graph Transformer architecture that integrates explicit graphs into pretrained Transformers. The result is state-of-the-art accuracies for various linguistic structures without needing custom pipelines.

We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case. Attention weights are functionally equivalent to graph edges. Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions, thereby integrating explicit graphs into the latent graphs learned by pretrained Transformers. Adding iterative graph refinement provides a joint embedding of input, output, and latent graphs, allowing non-autoregressive graph prediction to optimise the complete graph without any bespoke pipeline or decoding strategy. Empirical results show that this architecture achieves state-of-the-art accuracies for modelling a variety of linguistic structures, integrating very effectively with the latent linguistic representations learned by pretraining.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes