DAM-GT: Dual Positional Encoding-Based Attention Masking Graph Transformer for Node Classification
This work addresses node classification in graph learning, offering improvements over existing methods but appears incremental as it builds on the Transformer paradigm.
The paper tackled limitations in neighborhood-aware graph Transformers for node classification by proposing DAM-GT, which uses dual positional encoding and attention masking to better capture attribute correlations and reduce attention diversion, resulting in consistent outperformance of state-of-the-art methods across various graphs.
Neighborhood-aware tokenized graph Transformers have recently shown great potential for node classification tasks. Despite their effectiveness, our in-depth analysis of neighborhood tokens reveals two critical limitations in the existing paradigm. First, current neighborhood token generation methods fail to adequately capture attribute correlations within a neighborhood. Second, the conventional self-attention mechanism suffers from attention diversion when processing neighborhood tokens, where high-hop neighborhoods receive disproportionate focus, severely disrupting information interactions between the target node and its neighborhood tokens. To address these challenges, we propose DAM-GT, Dual positional encoding-based Attention Masking graph Transformer. DAM-GT introduces a novel dual positional encoding scheme that incorporates attribute-aware encoding via an attribute clustering strategy, effectively preserving node correlations in both topological and attribute spaces. In addition, DAM-GT formulates a new attention mechanism with a simple yet effective masking strategy to guide interactions between target nodes and their neighborhood tokens, overcoming the issue of attention diversion. Extensive experiments on various graphs with different homophily levels as well as different scales demonstrate that DAM-GT consistently outperforms state-of-the-art methods in node classification tasks.