LGSYDec 21, 2023

Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian Score Climbing

arXiv:2312.14000v12 citationsh-index: 46Has Code
Originality Highly original
AI Analysis

This provides a purely inference-centric method for sequential decision-making in stochastic dynamical systems, offering a novel approach to risk-sensitive control.

The paper tackles risk-sensitive stochastic optimal control by framing it as Markovian score climbing using a conditional particle filter, resulting in asymptotically unbiased gradient estimates for policy optimization without explicit value function learning.

Stochastic optimal control of dynamical systems is a crucial challenge in sequential decision-making. Recently, control-as-inference approaches have had considerable success, providing a viable risk-sensitive framework to address the exploration-exploitation dilemma. Nonetheless, a majority of these techniques only invoke the inference-control duality to derive a modified risk objective that is then addressed within a reinforcement learning framework. This paper introduces a novel perspective by framing risk-sensitive stochastic control as Markovian score climbing under samples drawn from a conditional particle filter. Our approach, while purely inference-centric, provides asymptotically unbiased estimates for gradient-based policy optimization with optimal importance weighting and no explicit value function learning. To validate our methodology, we apply it to the task of learning neural non-Gaussian feedback policies, showcasing its efficacy on numerical benchmarks of stochastic dynamical systems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes