IR AIMay 15, 2025

Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M

Dario Di Palma, Felice Antonio Merra, Maurizio Sfilio, Vito Walter Anelli, Fedelucio Narducci, Tommaso Di Noia

arXiv:2505.10212v119.823 citationsh-index: 40Has CodeSIGIR

Originality Incremental advance

AI Analysis

This addresses a problem for researchers and practitioners in recommender systems by highlighting how memorization can reduce generalizability and amplify biases, though it is incremental as it focuses on a preliminary study of one dataset.

The study investigated whether large language models (LLMs) memorize the MovieLens-1M recommendation dataset, finding that all tested models exhibited some memorization, and recommendation performance correlated with the extent of memorization.

Large Language Models (LLMs) have become increasingly central to recommendation scenarios due to their remarkable natural language understanding and generation capabilities. Although significant research has explored the use of LLMs for various recommendation tasks, little effort has been dedicated to verifying whether they have memorized public recommendation dataset as part of their training data. This is undesirable because memorization reduces the generalizability of research findings, as benchmarking on memorized datasets does not guarantee generalization to unseen datasets. Furthermore, memorization can amplify biases, for example, some popular items may be recommended more frequently than others. In this work, we investigate whether LLMs have memorized public recommendation datasets. Specifically, we examine two model families (GPT and Llama) across multiple sizes, focusing on one of the most widely used dataset in recommender systems: MovieLens-1M. First, we define dataset memorization as the extent to which item attributes, user profiles, and user-item interactions can be retrieved by prompting the LLMs. Second, we analyze the impact of memorization on recommendation performance. Lastly, we examine whether memorization varies across model families and model sizes. Our results reveal that all models exhibit some degree of memorization of MovieLens-1M, and that recommendation performance is related to the extent of memorization. We have made all the code publicly available at: https://github.com/sisinflab/LLM-MemoryInspector

View on arXiv PDF Code

Similar