STOCPRMLApr 23, 2021

Learning to reflect: A unifying approach for data-driven stochastic control strategies

arXiv:2104.11496v113 citations
Originality Incremental advance
AI Analysis

This work addresses the statistical challenge of data-driven control for applied probability fields, but it is incremental as it builds on known theoretical solutions and focuses on specific process classes.

The authors tackled the problem of developing purely data-driven strategies for stochastic optimal control when the underlying dynamics are unknown, showing that efficient strategies for singular control problems can be reduced to finding rate-optimal estimators for objects related to invariant distributions. They demonstrated that in the Lévy case, a fully data-driven strategy achieves regret of significantly better order than in the diffusion case.

Stochastic optimal control problems have a long tradition in applied probability, with the questions addressed being of high relevance in a multitude of fields. Even though theoretical solutions are well understood in many scenarios, their practicability suffers from the assumption of known dynamics of the underlying stochastic process, raising the statistical challenge of developing purely data-driven strategies. For the mathematically separated classes of continuous diffusion processes and Lévy processes, we show that developing efficient strategies for related singular stochastic control problems can essentially be reduced to finding rate-optimal estimators with respect to the sup-norm risk of objects associated to the invariant distribution of ergodic processes which determine the theoretical solution of the control problem. From a statistical perspective, we exploit the exponential $β$-mixing property as the common factor of both scenarios to drive the convergence analysis, indicating that relying on general stability properties of Markov processes is a sufficiently powerful and flexible approach to treat complex applications requiring statistical methods. We show moreover that in the Lévy case $-$ even though per se jump processes are more difficult to handle both in statistics and control theory $-$ a fully data-driven strategy with regret of significantly better order than in the diffusion case can be constructed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes