Shape-constrained Estimation of Value Functions
This work addresses a specific challenge in approximate dynamic programming and applied probability, offering an incremental improvement by integrating soft information into nonparametric estimation.
The authors tackled the problem of estimating value functions for Markov chains in infinite-horizon discounted reward settings by developing a fully nonparametric method that incorporates shape constraints like convexity or monotonicity, resulting in a provably consistent estimator as simulation time increases.
We present a fully nonparametric method to estimate the value function, via simulation, in the context of expected infinite-horizon discounted rewards for Markov chains. Estimating such value functions plays an important role in approximate dynamic programming and applied probability in general. We incorporate "soft information" into the estimation algorithm, such as knowledge of convexity, monotonicity, or Lipchitz constants. In the presence of such information, a nonparametric estimator for the value function can be computed that is provably consistent as the simulated time horizon tends to infinity. As an application, we implement our method on price tolling agreement contracts in energy markets.