Nonparametric Learning and Earning with One-Point Feedback under Nonstationarity
For firms using dynamic pricing with limited feedback in changing markets, this work offers a theoretically grounded method to learn and adapt without assuming demand parametric form.
The paper tackles dynamic pricing under nonstationary demand with only one price's revenue observed per period, developing a nonparametric learning framework with revenue-based gradient approximations and a restarting mechanism to adapt to market changes. It provides performance guarantees and demonstrates effectiveness via simulations.
Firms increasingly rely on dynamic pricing to respond to evolving customer demand, yet in many applications they observe only the revenue generated by a single posted price in each period. At the same time, market conditions may shift gradually or abruptly due to changes in customer preferences, competition, or external shocks. These features create two intertwined challenges: learning the revenue--demand relationship from limited feedback and adapting pricing decisions to a changing environment. We study how a seller can learn and earn effectively under these constraints, without assuming a specific parametric form for demand. We develop a learning framework that updates prices using revenue-based gradient approximations constructed from one observation per period. To address environmental changes, we incorporate a restarting mechanism that periodically refreshes the learning process so that outdated information is discounted. When the degree of nonstationarity is unknown, we further introduce a meta-learning layer to adaptively hedge across multiple restarting schedules. We provide performance guarantees for our approach, showing how cumulative revenue loss relative to a fully informed benchmark depends on both the time horizon and the magnitude of market variation. Simulation experiments using synthetic and real-world data illustrate the effectiveness of the proposed procedures.