30.6OCMar 13
Weakly Time-Coupled Approximation of Markov Decision ProcessesNegar Soheili, Selvaprabu Nadarajah, Bo Yang
Finite-horizon Markov decision processes (MDPs) with high-dimensional exogenous uncertainty and endogenous states arise in operations and finance, including the valuation and exercise of Bermudan and real options, but face a scalability barrier as computational complexity grows with the horizon. A common approximation represents the value function using basis functions, but methods for fitting weights treat cross-stage optimization differently. Least squares Monte Carlo (LSM) fits weights via backward recursion and regression, avoiding joint optimization but accumulating error over the horizon. Approximate linear programming (ALP) and pathwise optimization (PO) jointly fit weights to produce upper bounds, but temporal coupling causes computational complexity to grow with the horizon. We show this coupling is an artifact of the approximation architecture, and develop a weakly time-coupled approximation (WTCA) where cross-stage dependence is independent of horizon. For any fixed basis function set, the WTCA upper bound is tighter than that of ALP and looser than that of PO, and converges to the optimal policy value as the basis family expands. We extend parallel deterministic block coordinate descent to the stochastic MDP setting exploiting weak temporal coupling. Applied to WTCA, weak coupling yields computational complexity independent of the horizon. Within equal time budget, solving WTCA accommodates more exogenous samples or basis functions than PO, yielding tighter bounds despite PO being tighter for fixed samples and basis functions. On Bermudan option and ethanol production instances, WTCA produces tighter upper bounds than PO and LSM in every instance tested, with near-optimal policies at longer horizons.
LGNov 21, 2020
Adaptive Risk Mitigation in Demand LearningParshan Pakiman, Boxiao Chen, Selvaprabu Nadarajah et al.
We study dynamic pricing of a product with an unknown demand distribution over a finite horizon. Departing from the standard no-regret learning environment in which prices can be adjusted at any time, we restrict price changes to predetermined points in time to reflect common retail practice. This constraint, coupled with demand model ambiguity and an unknown customer arrival pattern, imposes a high risk of revenue loss, as a price based on a misestimated demand model may be applied to many customers before it can be revised. We develop an adaptive risk learning (ARL) framework that embeds a data-driven ambiguity set (DAS) to quantify demand model ambiguity by adapting to the unknown arrival pattern. Initially, when arrivals are few, the DAS includes a broad set of plausible demand models, reflecting high ambiguity and revenue risk. As new data is collected through pricing, the DAS progressively shrinks, capturing the reduction in model ambiguity and associated risk. We establish the probabilistic convergence of the DAS to the true demand model and derive a regret bound for the ARL policy that explicitly links revenue loss to the data required for the DAS to identify the true model with high probability. The dependence of our regret bound on the arrival pattern is unique to our constrained dynamic pricing problem and contrasts with no-regret learning environments, where regret is constant and arrival-pattern independent. Relaxing the constraint on infrequent price changes, we show that ARL attains the known constant regret bound. Numerical experiments further demonstrate that ARL outperforms benchmarks that prioritize either regret or risk alone by adaptively balancing both without knowledge of the arrival pattern. This adaptive risk adjustment is crucial for achieving high revenues and low downside risk when prices are sticky and both demand and arrival patterns are unknown.
LGJan 9, 2020
Self-guided Approximate Linear ProgramsParshan Pakiman, Selvaprabu Nadarajah, Negar Soheili et al.
Approximate linear programs (ALPs) are well-known models based on value function approximations (VFAs) to obtain policies and lower bounds on the optimal policy cost of discounted-cost Markov decision processes (MDPs). Formulating an ALP requires (i) basis functions, the linear combination of which defines the VFA, and (ii) a state-relevance distribution, which determines the relative importance of different states in the ALP objective for the purpose of minimizing VFA error. Both these choices are typically heuristic: basis function selection relies on domain knowledge while the state-relevance distribution is specified using the frequency of states visited by a heuristic policy. We propose a self-guided sequence of ALPs that embeds random basis functions obtained via inexpensive sampling and uses the known VFA from the previous iteration to guide VFA computation in the current iteration. Self-guided ALPs mitigate the need for domain knowledge during basis function selection as well as the impact of the initial choice of the state-relevance distribution, thus significantly reducing the ALP implementation burden. We establish high probability error bounds on the VFAs from this sequence and show that a worst-case measure of policy performance is improved. We find that these favorable implementation and theoretical properties translate to encouraging numerical results on perishable inventory control and options pricing applications, where self-guided ALP policies improve upon policies from problem-specific methods. More broadly, our research takes a meaningful step toward application-agnostic policies and bounds for MDPs.
OCAug 7, 2019
A Data Efficient and Feasible Level Set Method for Stochastic Convex Optimization with Expectation ConstraintsQihang Lin, Selvaprabu Nadarajah, Negar Soheili et al.
Stochastic convex optimization problems with expectation constraints (SOECs) are encountered in statistics and machine learning, business, and engineering. In data-rich environments, the SOEC objective and constraints contain expectations defined with respect to large datasets. Therefore, efficient algorithms for solving such SOECs need to limit the fraction of data points that they use, which we refer to as algorithmic data complexity. Recent stochastic first order methods exhibit low data complexity when handling SOECs but guarantee near-feasibility and near-optimality only at convergence. These methods may thus return highly infeasible solutions when heuristically terminated, as is often the case, due to theoretical convergence criteria being highly conservative. This issue limits the use of first order methods in several applications where the SOEC constraints encode implementation requirements. We design a stochastic feasible level set method (SFLS) for SOECs that has low data complexity and emphasizes feasibility before convergence. Specifically, our level-set method solves a root-finding problem by calling a novel first order oracle that computes a stochastic upper bound on the level-set function by extending mirror descent and online validation techniques. We establish that SFLS maintains a high-probability feasible solution at each root-finding iteration and exhibits favorable iteration complexity compared to state-of-the-art deterministic feasible level set and stochastic subgradient methods. Numerical experiments on three diverse applications validate the low data complexity of SFLS relative to the former approach and highlight how SFLS finds feasible solutions with small optimality gaps significantly faster than the latter method.