Dynamic pricing and assortment under a contextual MNL demand
This work addresses optimization challenges in online retail and advertising by providing more efficient algorithms for sellers, though it is incremental as it builds on existing MNL and bandit frameworks.
The paper tackles dynamic pricing and assortment problems with unknown demand over T periods, proposing a randomized pricing policy and an optimistic algorithm that achieve improved regret guarantees, specifically O(d√T log(T)) and Õ(d√κ₂T + log(T)/κ₂), compared to prior bounds.
We consider dynamic multi-product pricing and assortment problems under an unknown demand over T periods, where in each period, the seller decides on the price for each product or the assortment of products to offer to a customer who chooses according to an unknown Multinomial Logit Model (MNL). Such problems arise in many applications, including online retail and advertising. We propose a randomized dynamic pricing policy based on a variant of the Online Newton Step algorithm (ONS) that achieves a $O(d\sqrt{T}\log(T))$ regret guarantee under an adversarial arrival model. We also present a new optimistic algorithm for the adversarial MNL contextual bandits problem, which achieves a better dependency than the state-of-the-art algorithms in a problem-dependent constant $κ_2$ (potentially exponentially small). Our regret upper bound scales as $\tilde{O}(d\sqrt{κ_2 T}+ \log(T)/κ_2)$, which gives a stronger bound than the existing $\tilde{O}(d\sqrt{T}/κ_2)$ guarantees.