IRAIMar 11

Beyond Interleaving: Causal Attention Reformulations for Generative Recommender Systems

arXiv:2603.10369v17.32 citationsh-index: 5
Predicted impact top 95% in IR · last 90 daysOriginality Incremental advance
AI Analysis

This addresses scalability and efficiency problems for developers of generative recommender systems, though it appears incremental as it builds on existing Transformer-based sequence modeling.

The paper tackles inefficiencies in generative recommender systems caused by interleaving item and action tokens, which doubles sequence length and increases computational overhead. The proposed architectures (AttnLFA and AttnMVP) eliminate interleaved dependencies, reducing sequence complexity by 50% while achieving evaluation loss improvements of 0.29% and 0.80% and training time reductions of 23% and 12%.

Generative Recommender Systems (GR) increasingly model user behavior as a sequence generation task by interleaving item and action tokens. While effective, this formulation introduces significant structural and computational inefficiencies: it doubles sequence length, incurs quadratic overhead, and relies on implicit attention to recover the causal relationship between an item and its associated action. Furthermore, interleaving heterogeneous tokens forces the Transformer to disentangle semantically incompatible signals, leading to increased attention noise and reduced representation efficiency.In this work, we propose a principled reformulation of generative recommendation that aligns sequence modeling with underlying causal structures and attention theory. We demonstrate that current interleaving mechanisms act as inefficient proxies for similarity-weighted action pooling. To address this, we introduce two novel architectures that eliminate interleaved dependencies to reduce sequence complexity by 50%: Attention-based Late Fusion for Actions (AttnLFA) and Attention-based Mixed Value Pooling (AttnMVP). These models explicitly encode the $i_n \rightarrow a_n$ causal dependency while preserving the expressive power of Transformer-based sequence modeling.We evaluate our framework on large-scale product recommendation data from a major social network. Experimental results show that AttnLFA and AttnMVP consistently outperform interleaved baselines, achieving evaluation loss improvements of 0.29% and 0.80%, and significant gains in Normalized Entropy (NE). Crucially, these performance gains are accompanied by training time reductions of 23% and 12%, respectively. Our findings suggest that explicitly modeling item-action causality provides a superior design paradigm for scalable and efficient generative ranking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes