IR LGAug 5, 2025

Measuring the stability and plasticity of recommender systems

Maria João Lavoura, Robert Jungnickel, João Vinagre

arXiv:2508.03941v23.6h-index: 1

Originality Incremental advance

AI Analysis

This addresses the need for better long-term evaluation protocols in recommender systems, which is incremental as it builds on existing offline evaluation methods.

The paper tackles the problem that standard offline evaluation of recommender systems fails to capture long-term behavior when models are retrained over time, proposing a methodology to measure stability (retaining past patterns) and plasticity (adapting to changes) in models, with preliminary results on the GoodReads dataset showing different profiles and a possible trade-off between these properties.

The typical offline protocol to evaluate recommendation algorithms is to collect a dataset of user-item interactions and then use a part of this dataset to train a model, and the remaining data to measure how closely the model recommendations match the observed user interactions. This protocol is straightforward, useful and practical, but it only captures performance of a particular model trained at some point in the past. We know, however, that online systems evolve over time. In general, it is a good idea that models reflect such changes, so models are frequently retrained with recent data. But if this is the case, to what extent can we trust previous evaluations? How will a model perform when a different pattern (re)emerges? In this paper we propose a methodology to study how recommendation models behave when they are retrained. The idea is to profile algorithms according to their ability to, on the one hand, retain past patterns - stability - and, on the other hand, (quickly) adapt to changes - plasticity. We devise an offline evaluation protocol that provides detail on the long-term behavior of models, and that is agnostic to datasets, algorithms and metrics. To illustrate the potential of this framework, we present preliminary results of three different types of algorithms on the GoodReads dataset that suggest different stability and plasticity profiles depending on the algorithmic technique, and a possible trade-off between stability and plasticity. Although additional experiments will be necessary to confirm these observations, they already illustrate the usefulness of the proposed framework to gain insights on the long term dynamics of recommendation models.

View on arXiv PDF

Similar