CLAIIRJul 19, 2025

GRACE: Generative Recommendation via Journey-Aware Sparse Attention on Chain-of-Thought Tokenization

arXiv:2507.14758v14 citationsh-index: 25RecSys
Originality Incremental advance
AI Analysis

This addresses computational bottlenecks in generative recommendation systems for e-commerce applications, representing a strong incremental advance.

The paper tackles the problem of inefficient generative models for multi-behavior recommendation by proposing GRACE, which introduces Chain-of-Thought tokenization and Journey-Aware Sparse Attention, achieving up to +106.9% HR@10 improvement and 48% computation reduction.

Generative models have recently demonstrated strong potential in multi-behavior recommendation systems, leveraging the expressive power of transformers and tokenization to generate personalized item sequences. However, their adoption is hindered by (1) the lack of explicit information for token reasoning, (2) high computational costs due to quadratic attention complexity and dense sequence representations after tokenization, and (3) limited multi-scale modeling over user history. In this work, we propose GRACE (Generative Recommendation via journey-aware sparse Attention on Chain-of-thought tokEnization), a novel generative framework for multi-behavior sequential recommendation. GRACE introduces a hybrid Chain-of-Thought (CoT) tokenization method that encodes user-item interactions with explicit attributes from product knowledge graphs (e.g., category, brand, price) over semantic tokenization, enabling interpretable and behavior-aligned generation. To address the inefficiency of standard attention, we design a Journey-Aware Sparse Attention (JSA) mechanism, which selectively attends to compressed, intra-, inter-, and current-context segments in the tokenized sequence. Experiments on two real-world datasets show that GRACE significantly outperforms state-of-the-art baselines, achieving up to +106.9% HR@10 and +106.7% NDCG@10 improvement over the state-of-the-art baseline on the Home domain, and +22.1% HR@10 on the Electronics domain. GRACE also reduces attention computation by up to 48% with long sequences.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes