Gated Rotary-Enhanced Linear Attention with Rank Modulation for Long-term Sequential Recommendation
This work addresses efficiency and accuracy issues in sequential recommendation systems for users with long behavior histories, representing an incremental improvement over existing linear attention methods.
The paper tackled the computational and memory challenges of Transformer models in long-term sequential recommendation systems by proposing RecGRELA, which uses rotary-enhanced linear attention and adaptive rank modulation to efficiently model dependencies and balance long-term and short-term interests, achieving state-of-the-art performance on four benchmark datasets with low memory overhead.
In Sequential Recommendation Systems (SRSs), Transformer models have demonstrated remarkable performance but face computational and memory cost challenges, especially when modeling long-term user behavior sequences. Due to its quadratic complexity, the dot-product attention mechanism in Transformers becomes expensive for processing long sequences. By approximating the dot-product attention using elaborate mapping functions, linear attention provides a more efficient option with linear complexity. However, existing linear attention methods face three limitations: 1) they often use learnable position encodings, which incur extra computational costs in long-term sequence scenarios, 2) limited by the low-rank deficiency, they may not sufficiently account for user's fine-grained local preferences (short-lived burst of interest), and 3) they try to capture some temporary activities, but often confuse these with stable and long-term interests. This can result in unclear or less effective recommendations. To remedy these drawbacks, we propose a long-term sequential Recommendation model with Gated Rotary Enhanced Linear Attention (RecGRELA). Specifically, we first propose a Rotary-Enhanced Linear Attention (RELA) module to efficiently model long-range dependency within the user's historical information using rotary position encodings. Then, to address the low-rank deficiency of linear attention, we introduce an Adaptive Rank Modulator. It incorporates a rank augmentation branch to explicitly inject local token mixing and a Gated Rank Selector to dynamically balance stable long-term preferences and transient short-term interests. Experimental results on four public benchmark datasets show that our RecGRELA achieves state-of-the-art performance compared with existing SRSs based on Recurrent Neural Networks, Transformer, and Mamba while keeping low memory overhead.