LG GT MA TH MLNov 2, 2022

Learning to Price Supply Chain Contracts against a Learning Retailer

Xuejun Zhao, Ruihao Zhu, William B. Haskell

arXiv:2211.04586v13.32 citationsh-index: 17

Originality Incremental advance

AI Analysis

This provides a data-driven solution for supply chain contract pricing against adaptive agents, though it is incremental in extending online learning to multi-agent settings.

The paper tackles the problem of a supplier designing pricing policies against a learning retailer under uncertain demand, achieving sublinear regret bounds across various retailer learning strategies without prior knowledge of demand or retailer policy.

The rise of big data analytics has automated the decision-making of companies and increased supply chain agility. In this paper, we study the supply chain contract design problem faced by a data-driven supplier who needs to respond to the inventory decisions of the downstream retailer. Both the supplier and the retailer are uncertain about the market demand and need to learn about it sequentially. The goal for the supplier is to develop data-driven pricing policies with sublinear regret bounds under a wide range of possible retailer inventory policies for a fixed time horizon. To capture the dynamics induced by the retailer's learning policy, we first make a connection to non-stationary online learning by following the notion of variation budget. The variation budget quantifies the impact of the retailer's learning strategy on the supplier's decision-making. We then propose dynamic pricing policies for the supplier for both discrete and continuous demand. We also note that our proposed pricing policy only requires access to the support of the demand distribution, but critically, does not require the supplier to have any prior knowledge about the retailer's learning policy or the demand realizations. We examine several well-known data-driven policies for the retailer, including sample average approximation, distributionally robust optimization, and parametric approaches, and show that our pricing policies lead to sublinear regret bounds in all these cases. At the managerial level, we answer affirmatively that there is a pricing policy with a sublinear regret bound under a wide range of retailer's learning policies, even though she faces a learning retailer and an unknown demand distribution. Our work also provides a novel perspective in data-driven operations management where the principal has to learn to react to the learning policies employed by other agents in the system.

View on arXiv PDF

Similar