SYSYMar 15

On Globally Optimal Stochastic Policy Gradient Methods for Domain Randomized LQR Synthesis

arXiv:2603.1419755.5h-index: 8
AI Analysis

This work addresses the lack of theoretically motivated principles for domain randomization in robotics, offering incremental improvements for robust policy synthesis.

The paper tackles the problem of designing optimization schemes for domain randomized linear-quadratic regulator synthesis to reduce the sim-to-real gap, showing that stochastic policy gradient descent with resampling at each step converges to global optima and yields better controllers with lower variability compared to fixed-set approaches.

Domain randomization is a simple, effective, and flexible scheme for obtaining robust feedback policies aimed at reducing the sim-to-real gap due to model mismatch. While domain randomization methods have yielded impressive demonstrations in the robotics-learning literature, general and theoretically motivated principles for designing optimization schemes that effectively leverage the randomization are largely unexplored. We address this gap by considering a stochastic policy gradient descent method for the domain randomized linear-quadratic regulator synthesis problem, a situation simple enough to provide theoretical guarantees. In particular, we demonstrate that stochastic gradients obtained by repeatedly sampling new systems at each gradient step converge to global optima with appropriate hyperparameters choices, and yield better controllers with lower variability in the final controllers when compared to approaches that do not resample. Sampling is often a quick and cheap operation, so computing policy gradients with newly sampled systems at each iteration is preferable to evaluating gradients on a fixed set of systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes