Configuration Path Control
This addresses stability issues in reinforcement learning for control systems, particularly in robotics, but is incremental as it builds on existing methods.
The paper tackled the problem of brittle reinforcement learning policies that generalize poorly and become unstable under small disturbances, by proposing a post-training stabilization method in configuration path space. The result was a two- to four-fold increase in stability measured by perturbation amplitudes on a planar bipedal walker.
Reinforcement learning methods often produce brittle policies -- policies that perform well during training, but generalize poorly beyond their direct training experience, thus becoming unstable under small disturbances. To address this issue, we propose a method for stabilizing a control policy in the space of configuration paths. It is applied post-training and relies purely on the data produced during training, as well as on an instantaneous control-matrix estimation. The approach is evaluated empirically on a planar bipedal walker subjected to a variety of perturbations. The control policies obtained via reinforcement learning are compared against their stabilized counterparts. Across different experiments, we find two- to four-fold increase in stability, when measured in terms of the perturbation amplitudes. We also provide a zero-dynamics interpretation of our approach.