SOLAR: SVD-Optimized Lifelong Attention for Recommendation
This work addresses the problem of efficient attention mechanisms for recommender systems, particularly in scenarios with long-context modeling and large candidate sets.
The authors tackled the problem of expensive attention mechanisms in Transformers, achieving a reduction in complexity from $O(N^2 d)$ to $O(Ndr)$ with their SVD-Attention method, and obtained a 0.68% Video Views gain in Kuaishou's online recommendation scenario. This improvement enables sequence modeling for behavior sequences of ten-thousand scale and candidate sets of several thousand items.
Attention mechanism remains the defining operator in Transformers since it provides expressive global credit assignment, yet its $O(N^2 d)$ time and memory cost in sequence length $N$ makes long-context modeling expensive and often forces truncation or other heuristics. Linear attention reduces complexity to $O(N d^2)$ by reordering computation through kernel feature maps, but this reformulation drops the softmax mechanism and shifts the attention score distribution. In recommender systems, low-rank structure in matrices is not a rare case, but rather the default inductive bias in its representation learning, particularly explicit in the user behavior sequence modeling. Leveraging this structure, we introduce SVD-Attention, which is theoretically lossless on low-rank matrices and preserves softmax while reducing attention complexity from $O(N^2 d)$ to $O(Ndr)$. With SVD-Attention, we propose SOLAR, SVD-Optimized Lifelong Attention for Recommendation, a sequence modeling framework that supports behavior sequences of ten-thousand scale and candidate sets of several thousand items in cascading process without any filtering. In Kuaishou's online recommendation scenario, SOLAR delivers a 0.68\% Video Views gain together with additional business metrics improvements.