Dynamic Assortment Selection and Pricing with Censored Preference Feedback
This addresses the problem of optimizing revenue for sellers in e-commerce or retail by dynamically adjusting assortments and prices based on censored buyer feedback, representing an incremental advance in online learning for revenue management.
The paper tackles dynamic product selection and pricing under a censored multinomial logit model, where buyers filter out overpriced items and purchase based on preferences, aiming to maximize seller revenue by learning valuations and preferences from feedback. It introduces algorithms combining Lower Confidence Bound pricing with Upper Confidence Bound or Thompson Sampling selection, achieving regret bounds of \(O(d^{3/2}\sqrt{T/κ})) and \(O(d^{2}\sqrt{T/κ})), and validates them through simulations.
In this study, we investigate the problem of dynamic multi-product selection and pricing by introducing a novel framework based on a \textit{censored multinomial logit} (C-MNL) choice model. In this model, sellers present a set of products with prices, and buyers filter out products priced above their valuation, purchasing at most one product from the remaining options based on their preferences. The goal is to maximize seller revenue by dynamically adjusting product offerings and prices, while learning both product valuations and buyer preferences through purchase feedback. To achieve this, we propose a Lower Confidence Bound (LCB) pricing strategy. By combining this pricing strategy with either an Upper Confidence Bound (UCB) or Thompson Sampling (TS) product selection approach, our algorithms achieve regret bounds of $\tilde{O}(d^{\frac{3}{2}}\sqrt{T/κ})$ and $\tilde{O}(d^{2}\sqrt{T/κ})$, respectively. Finally, we validate the performance of our methods through simulations, demonstrating their effectiveness.