LGApr 29, 2025

SFi-Former: Sparse Flow Induced Attention for Graph Transformer

Zhonghao Li, Ji Shi, Xinming Zhang, Miao Zhang, Bo Li

arXiv:2504.20666v17.11 citationsh-index: 5ICMR

Originality Incremental advance

AI Analysis

This addresses issues in graph learning for tasks with long-range dependencies, offering an incremental improvement over existing Graph Transformers.

The paper tackles the problems of weak inductive bias, overfitting, and over-globalization in Graph Transformers by introducing SFi-attention, a sparse attention mechanism based on network flows with l1-norm regularization, and SFi-Former, which leverages this to achieve competitive performance on GNN Benchmark datasets and state-of-the-art results on LongRange Graph Benchmark datasets, with smaller generalization gaps indicating reduced overfitting.

Graph Transformers (GTs) have demonstrated superior performance compared to traditional message-passing graph neural networks in many studies, especially in processing graph data with long-range dependencies. However, GTs tend to suffer from weak inductive bias, overfitting and over-globalizing problems due to the dense attention. In this paper, we introduce SFi-attention, a novel attention mechanism designed to learn sparse pattern by minimizing an energy function based on network flows with l1-norm regularization, to relieve those issues caused by dense attention. Furthermore, SFi-Former is accordingly devised which can leverage the sparse attention pattern of SFi-attention to generate sparse network flows beyond adjacency matrix of graph data. Specifically, SFi-Former aggregates features selectively from other nodes through flexible adaptation of the sparse attention, leading to a more robust model. We validate our SFi-Former on various graph datasets, especially those graph data exhibiting long-range dependencies. Experimental results show that our SFi-Former obtains competitive performance on GNN Benchmark datasets and SOTA performance on LongRange Graph Benchmark (LRGB) datasets. Additionally, our model gives rise to smaller generalization gaps, which indicates that it is less prone to over-fitting. Click here for codes.

View on arXiv PDF

Similar