How Well Does Generative Recommendation Generalize?
This work addresses the problem of understanding and improving recommendation system generalization for researchers and practitioners, though it is incremental in nature.
The paper investigates whether generative recommendation models generalize better than item ID-based models by categorizing data instances into memorization and generalization tasks, finding that generative models excel at generalization while ID-based models are better at memorization, and proposes a hybrid approach that improves overall performance.
A widely held hypothesis for why generative recommendation (GR) models outperform conventional item ID-based models is that they generalize better. However, there is few systematic way to verify this hypothesis beyond a superficial comparison of overall performance. To address this gap, we categorize each data instance based on the specific capability required for a correct prediction: either memorization (reusing item transition patterns observed during training) or generalization (composing known patterns to predict unseen item transitions). Extensive experiments show that GR models perform better on instances that require generalization, whereas item ID-based models perform better when memorization is more important. To explain this divergence, we shift the analysis from the item level to the token level and show that what appears to be item-level generalization often reduces to token-level memorization for GR models. Finally, we show that the two paradigms are complementary. We propose a simple memorization-aware indicator that adaptively combines them on a per-instance basis, leading to improved overall recommendation performance.