Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity
This addresses a specific bottleneck in neural bandit algorithms for applications like personalized recommendations, offering an incremental improvement to existing CNB methods.
The paper tackles the problem of loss of plasticity in Clustering of Neural Bandits (CNB), where neural parameters become rigid over time, limiting adaptation to non-stationary environments like dynamic user preferences. The proposed Selective Reinitialization (SeRe) framework dynamically resets underutilized units, achieving sublinear cumulative regret and improving adaptability with lower regrets on six real-world recommendation datasets.
Clustering of Bandits (CB) methods enhance sequential decision-making by grouping bandits into clusters based on similarity and incorporating cluster-level contextual information, demonstrating effectiveness and adaptability in applications like personalized streaming recommendations. However, when extending CB algorithms to their neural version (commonly referred to as Clustering of Neural Bandits, or CNB), they suffer from loss of plasticity, where neural network parameters become rigid and less adaptable over time, limiting their ability to adapt to non-stationary environments (e.g., dynamic user preferences in recommendation). To address this challenge, we propose Selective Reinitialization (SeRe), a novel bandit learning framework that dynamically preserves the adaptability of CNB algorithms in evolving environments. SeRe leverages a contribution utility metric to identify and selectively reset underutilized units, mitigating loss of plasticity while maintaining stable knowledge retention. Furthermore, when combining SeRe with CNB algorithms, the adaptive change detection mechanism adjusts the reinitialization frequency according to the degree of non-stationarity, ensuring effective adaptation without unnecessary resets. Theoretically, we prove that SeRe enables sublinear cumulative regret in piecewise-stationary environments, outperforming traditional CNB approaches in long-term performances. Extensive experiments on six real-world recommendation datasets demonstrate that SeRe-enhanced CNB algorithms can effectively mitigate the loss of plasticity with lower regrets, improving adaptability and robustness in dynamic settings.