Mihalis G. Markakis

SYJul 24, 2012

Delay Stability Regions of the Max-Weight Policy under Heavy-Tailed Traffic

Mihalis G. Markakis, Eytan Modiano, John N. Tsitsiklis

We carry out a delay stability analysis (i.e., determine conditions under which expected steady-state delays at a queue are finite) for a simple 3-queue system operated under the Max-Weight scheduling policy, for the case where one of the queues is fed by heavy-tailed traffic (i.e, when the number of arrivals at each time slot has infinite second moment). This particular system exemplifies an intricate phenomenon whereby heavy-tailed traffic at one queue may or may not result in the delay instability of another queue, depending on the arrival rates. While the ordinary stability region (in the sense of convergence to a steady-state distribution) is straightforward to determine, the determination of the delay stability region is more involved: (i) we use "fluid-type" sample path arguments, combined with renewal theory, to prove delay instability outside a certain region; (ii) we use a piecewise linear Lyapunov function to prove delay stability in the interior of that same region; (iii) as an intermediate step in establishing delay stability, we show that the expected workload of a stable M/GI/1 queue scales with time as $\mathcal{O}(t^{1/(1+γ)})$, assuming that service times have a finite $1+γ$ moment, where $γ\in (0,1)$.

LGOct 16, 2017

On the Hardness of Inventory Management with Censored Demand Data

Gábor Lugosi, Mihalis G. Markakis, Gergely Neu

We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored/sales data. In analogy to multi-armed bandit problems, the manager needs to simultaneously "explore" and "exploit" with her inventory decisions, in order to minimize the cumulative cost. We make no probabilistic assumptions---importantly, independence or time stationarity---regarding the mechanism that creates the demand sequence. Our goal is to shed light on the hardness of the problem, and to develop policies that perform well with respect to the regret criterion, that is, the difference between the cumulative cost of a policy and that of the best fixed action/static inventory decision in hindsight, uniformly over all feasible demand sequences. We show that a simple randomized policy, termed the Exponentially Weighted Forecaster, combined with a carefully designed cost estimator, achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to all three key primitives: the number of time periods, the number of inventory decisions available, and the demand support. Through this result, we derive an important insight: the benefit from "information stalking" as well as the cost of censoring are both negligible in this dynamic learning problem, at least with respect to the regret criterion. Furthermore, we modify the proposed policy in order to perform well in terms of the tracking regret, that is, using as benchmark the best sequence of inventory decisions that switches a limited number of times. Numerical experiments suggest that the proposed approach outperforms existing ones (that are tailored to, or facilitated by, time stationarity) on nonstationary demand models. Finally, we extend the proposed approach and its analysis to a "combinatorial" version of the repeated newsvendor problem.

Mihalis G. Markakis

2 Papers