LGAIRONov 13, 2025

Harnessing Bounded-Support Evolution Strategies for Policy Refinement

arXiv:2511.09923v2
Originality Incremental advance
AI Analysis

This addresses the challenge of reliable policy refinement for robotics, though it appears incremental as it builds on existing Evolution Strategies and PPO methods.

The paper tackles the problem of improving robot policies with noisy gradients by proposing Triangular-Distribution Evolution Strategies (TD-ES), which uses bounded triangular noise and a centered-rank estimator for stable updates. In robotic manipulation tasks, TD-ES increased success rates by 26.5% compared to PPO and reduced variance.

Improving competent robot policies with on-policy RL is often hampered by noisy, low-signal gradients. We revisit Evolution Strategies (ES) as a policy-gradient proxy and localize exploration with bounded, antithetic triangular perturbations, suitable for policy refinement. We propose Triangular-Distribution ES (TD-ES) which pairs bounded triangular noise with a centered-rank finite-difference estimator to deliver stable, parallelizable, gradient-free updates. In a two-stage pipeline - PPO pretraining followed by TD-ES refinement - this preserves early sample efficiency while enabling robust late-stage gains. Across a suite of robotic manipulation tasks, TD-ES raises success rates by 26.5% relative to PPO and greatly reduces variance, offering a simple, compute-light path to reliable refinement.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes