83.5LGMay 25
Optimal and Order-optimal Gated Priority-based Greedy Policies for Two-layer Multi-item Order FulfillmentXi Chen, Yuze Chen, Ziyi Chen et al.
We study how an e-commerce firm should make real-time fulfillment decisions in a two-layer distribution network when multi-item customer orders arrive sequentially and future demand is unknown. The central managerial tension is whether to use scarce front distribution center (FDC) inventory to save current fulfillment cost or preserve that inventory for future orders that may be more valuable to serve locally. We formulate an adversarial online model with multiple FDCs, one regional distribution center (RDC), multi-unit multi-item orders, and item-specific and time-varying variable costs. Our theoretical objective is to characterize when simple, interpretable, and implementable fulfillment rules can perform nearly as well as an optimal clairvoyant planner. We develop a family of Gated Priority-based Greedy policies, derive competitive-ratio guarantees under both time-varying and time-invariant cost structures, and establish matching or near-matching lower bounds for any online algorithm. Numerical experiments show that the proposed policies perform strongly relative to generalized myopic and forecast-based benchmarks. The analysis yields managerial guidance on when local inventory should be protected, when splitting orders is worth the fixed-cost burden, and how the relative magnitudes of fixed and variable costs determine the value of more sophisticated optimization.
OCJan 21
Online Linear Programming with ReplenishmentYuze Chen, Yuan Zhou, Baichuan Mo et al.
We study an online linear programming (OLP) model in which inventory is not provided upfront but instead arrives gradually through an exogenous stochastic replenishment process. This replenishment-based formulation captures operational settings, such as e-commerce fulfillment, perishable supply chains, and renewable-powered systems, where resources are accumulated gradually and initial inventories are small or zero. The introduction of dispersed, uncertain replenishment fundamentally alters the structure of classical OLPs, creating persistent stockout risk and eliminating advance knowledge of the total budget. We develop new algorithms and regret analyses for three major distributional regimes studied in the OLP literature: bounded distributions, finite-support distributions, and continuous-support distributions with a non-degeneracy condition. For bounded distributions, we design an algorithm that achieves $\widetilde{\mathcal{O}}(\sqrt{T})$ regret. For finite-support distributions with a non-degenerate induced LP, we obtain $\mathcal{O}(\log T)$ regret, and we establish an $Ω(\sqrt{T})$ lower bound for degenerate instances, demonstrating a sharp separation from the classical setting where $\mathcal{O}(1)$ regret is achievable. For continuous-support, non-degenerate distributions, we develop a two-stage accumulate-then-convert algorithm that achieves $\mathcal{O}(\log^2 T)$ regret, comparable to the $\mathcal{O}(\log T)$ regret in classical OLPs. Together, these results provide a near-complete characterization of the optimal regret achievable in OLP with replenishment. Finally, we empirically evaluate our algorithms and demonstrate their advantages over natural adaptations of classical OLP methods in the replenishment setting.
LGSep 2, 2025
AdaSwitch: An Adaptive Switching Meta-Algorithm for Learning-Augmented Bounded-Influence ProblemsXi Chen, Yuze Chen, Yuan Zhou
We study a class of multi-period online decision-making problems with sequence-based predictions, which may be generated by machine learning models but whose accuracy is not guaranteed. In each period, the decision-maker observes the realized request and must take an irrevocable action that yields a reward or incurs a cost, without knowledge of future arrivals. We introduce a bounded-influence framework, in which past decisions and requests exert only limited impact on the future optimal reward. Within this framework, we propose the AdaSwitch meta-algorithm, which exploits predictions to attain performance close to the offline benchmark when predictions are accurate, while preserving classical competitive-ratio guarantees under highly inaccurate predictions. Our framework and meta-algorithm apply to diverse settings, including lead-time quotation in processing systems, the $k$-server problem, and online allocation of reusable resources. These applications illustrate the flexibility and broad applicability of our approach to learning-augmented online decision-making.
LGMay 2, 2025
A Minimax-MDP Framework with Future-imposed Conditions for Learning-augmented ProblemsXin Chen, Yuze Chen, Yuan Zhou
We study a class of sequential decision-making problems with augmented predictions, potentially provided by a machine learning algorithm. In this setting, the decision-maker receives prediction intervals for unknown parameters that become progressively refined over time, and seeks decisions that are competitive with the hindsight optimal under all possible realizations of both parameters and predictions. We propose a minimax Markov Decision Process (minimax-MDP) framework, where the system state consists of an adversarially evolving environment state and an internal state controlled by the decision-maker. We introduce a set of future-imposed conditions that characterize the feasibility of minimax-MDPs and enable the design of efficient, often closed-form, robustly competitive policies. We illustrate the framework through three applications: multi-period inventory ordering with refining demand predictions, resource allocation with uncertain utility functions, and a multi-phase extension of the minimax-MDP applied to the inventory problem with time-varying ordering costs. Our results provide a tractable and versatile approach to robust online decision-making under predictive uncertainty.