Adaptive Risk Mitigation in Demand Learning
This work is significant for retailers facing sticky prices and unknown demand/arrival patterns, as it provides a method to mitigate revenue loss risk.
This paper addresses dynamic pricing with an unknown demand distribution and customer arrival patterns, where price changes are restricted to predetermined times. They developed an Adaptive Risk Learning (ARL) framework that uses a data-driven ambiguity set to quantify and reduce demand model ambiguity, demonstrating probabilistic convergence to the true demand model and outperforming benchmarks by adaptively balancing regret and risk.
We study dynamic pricing of a product with an unknown demand distribution over a finite horizon. Departing from the standard no-regret learning environment in which prices can be adjusted at any time, we restrict price changes to predetermined points in time to reflect common retail practice. This constraint, coupled with demand model ambiguity and an unknown customer arrival pattern, imposes a high risk of revenue loss, as a price based on a misestimated demand model may be applied to many customers before it can be revised. We develop an adaptive risk learning (ARL) framework that embeds a data-driven ambiguity set (DAS) to quantify demand model ambiguity by adapting to the unknown arrival pattern. Initially, when arrivals are few, the DAS includes a broad set of plausible demand models, reflecting high ambiguity and revenue risk. As new data is collected through pricing, the DAS progressively shrinks, capturing the reduction in model ambiguity and associated risk. We establish the probabilistic convergence of the DAS to the true demand model and derive a regret bound for the ARL policy that explicitly links revenue loss to the data required for the DAS to identify the true model with high probability. The dependence of our regret bound on the arrival pattern is unique to our constrained dynamic pricing problem and contrasts with no-regret learning environments, where regret is constant and arrival-pattern independent. Relaxing the constraint on infrequent price changes, we show that ARL attains the known constant regret bound. Numerical experiments further demonstrate that ARL outperforms benchmarks that prioritize either regret or risk alone by adaptively balancing both without knowledge of the arrival pattern. This adaptive risk adjustment is crucial for achieving high revenues and low downside risk when prices are sticky and both demand and arrival patterns are unknown.