Privacy-Preserving Dynamic Assortment Selection
It addresses privacy risks in dynamic recommendation systems for users and platforms, offering a novel method with theoretical guarantees, though it builds upon existing bandit models.
The paper tackles the problem of personalized assortment recommendations with privacy concerns by proposing a privacy-preserving dynamic assortment selection framework using multinomial logit bandits, achieving a near-optimal regret bound of O(√T) and demonstrating performance improvements in simulations and on the Expedia hotel dataset.
With the growing demand for personalized assortment recommendations, concerns over data privacy have intensified, highlighting the urgent need for effective privacy-preserving strategies. This paper presents a novel framework for privacy-preserving dynamic assortment selection using the multinomial logit (MNL) bandits model. Our approach employs a perturbed upper confidence bound method, integrating calibrated noise into user utility estimates to balance between exploration and exploitation while ensuring robust privacy protection. We rigorously prove that our policy satisfies Joint Differential Privacy (JDP), which better suits dynamic environments than traditional differential privacy, effectively mitigating inference attack risks. This analysis is built upon a novel objective perturbation technique tailored for MNL bandits, which is also of independent interest. Theoretically, we derive a near-optimal regret bound of $\tilde{O}(\sqrt{T})$ for our policy and explicitly quantify how privacy protection impacts regret. Through extensive simulations and an application to the Expedia hotel dataset, we demonstrate substantial performance enhancements over the benchmark method.