Learning Decisions Offline from Censored Observations with ε-insensitive Operational Costs
This work addresses a managerial decision-making problem for operations research, offering incremental improvements in handling censored data in offline settings.
The paper tackles the problem of making data-driven decisions from offline datasets with censored observations, without assuming underlying distributions, by designing ε-insensitive operational costs to handle unobserved censoring. The results show that their methods, including linear regression and neural network models, outperform existing approaches with maximum cost savings up to 14.40% and 12.21%, and produce order quantities significantly closer to optimal solutions.
Many important managerial decisions are made based on censored observations. Making decisions without adequately handling the censoring leads to inferior outcomes. We investigate the data-driven decision-making problem with an offline dataset containing the feature data and the censored historical data of the variable of interest without the censoring indicators. Without assuming the underlying distribution, we design and leverage ε-insensitive operational costs to deal with the unobserved censoring in an offline data-driven fashion. We demonstrate the customization of the ε-insensitive operational costs for a newsvendor problem and use such costs to train two representative ML models, including linear regression (LR) models and neural networks (NNs). We derive tight generalization bounds for the custom LR model without regularization (LR-εNVC) and with regularization (LR-εNVC-R), and a high-probability generalization bound for the custom NN (NN-εNVC) trained by stochastic gradient descent. The theoretical results reveal the stability and learnability of LR-εNVC, LR-εNVC-R and NN-εNVC. We conduct extensive numerical experiments to compare LR-εNVC-R and NN-εNVC with two existing approaches, estimate-as-solution (EAS) and integrated estimation and optimization (IEO). The results show that LR-εNVC-R and NN-εNVC outperform both EAS and IEO, with maximum cost savings up to 14.40% and 12.21% compared to the lowest cost generated by the two existing approaches. In addition, LR-εNVC-R's and NN-εNVC's order quantities are statistically significantly closer to the optimal solutions should the underlying distribution be known.