Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem
This work addresses a practical challenge for companies in e-commerce or retail that frequently introduce new products, though it is incremental as it builds on existing bandit and recommendation models.
The paper tackles the problem of learning customer preferences for new and existing products through tiered recommendations in an online setting, proposing a sequential multinomial logit model and a learning algorithm with a quantified regret bound to mitigate risks associated with new product launches.
Motivated by the phenomenon that companies introduce new products to keep abreast with customers' rapidly changing tastes, we consider a novel online learning setting where a profit-maximizing seller needs to learn customers' preferences through offering recommendations, which may contain existing products and new products that are launched in the middle of a selling period. We propose a sequential multinomial logit (SMNL) model to characterize customers' behavior when product recommendations are presented in tiers. For the offline version with known customers' preferences, we propose a polynomial-time algorithm and characterize the properties of the optimal tiered product recommendation. For the online problem, we propose a learning algorithm and quantify its regret bound. Moreover, we extend the setting to incorporate a constraint which ensures every new product is learned to a given accuracy. Our results demonstrate the tier structure can be used to mitigate the risks associated with learning new products.