LGAIROFeb 2, 2022

Do Differentiable Simulators Give Better Policy Gradients?

arXiv:2202.00817v2134 citations
Originality Incremental advance
AI Analysis

This addresses a key problem for reinforcement learning practitioners by analyzing estimator performance in complex control tasks, though it is incremental as it builds on existing gradient methods.

The paper investigates whether differentiable simulators improve policy gradients in reinforcement learning, finding that physical system characteristics like stiffness can hinder first-order estimators, and proposes an α-order gradient estimator that combines efficiency and robustness, demonstrated through numerical examples.

Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $α$-order gradient estimator, with $α\in [0,1]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the $α$-order estimator on some numerical examples.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes