LGJun 23, 2012

Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms

arXiv:1206.5345v44 citations

AI Analysis

This addresses the problem of revenue optimization for sellers under demand uncertainty, offering a theoretical improvement over standard bandit methods.

The paper tackles dynamic pricing with unknown demand models by formulating it as a multi-armed bandit with dependent arms and proposing a policy based on likelihood ratio tests, achieving bounded regret in contrast to logarithmic regret in independent-arm bandits.

We consider a dynamic pricing problem under unknown demand models. In this problem a seller offers prices to a stream of customers and observes either success or failure in each sale attempt. The underlying demand model is unknown to the seller and can take one of N possible forms. In this paper, we show that this problem can be formulated as a multi-armed bandit with dependent arms. We propose a dynamic pricing policy based on the likelihood ratio test. We show that the proposed policy achieves complete learning, i.e., it offers a bounded regret where regret is defined as the revenue loss with respect to the case with a known demand model. This is in sharp contrast with the logarithmic growing regret in multi-armed bandit with independent arms.

View on arXiv PDF

Similar