LGAug 11, 2023

Learning Control Policies for Variable Objectives from Offline Data

arXiv:2308.06127v110 citationsh-index: 23
Originality Incremental advance
AI Analysis

This is an incremental improvement for offline RL in control systems, allowing flexible policy adaptation.

The paper tackles the problem of offline reinforcement learning for dynamical systems by introducing variable objective policy (VOP), which trains policies to generalize over multiple objectives parameterizing the reward function, enabling behavior adjustment at runtime without additional data collection or retraining.

Offline reinforcement learning provides a viable approach to obtain advanced control strategies for dynamical systems, in particular when direct interaction with the environment is not available. In this paper, we introduce a conceptual extension for model-based policy search methods, called variable objective policy (VOP). With this approach, policies are trained to generalize efficiently over a variety of objectives, which parameterize the reward function. We demonstrate that by altering the objectives passed as input to the policy, users gain the freedom to adjust its behavior or re-balance optimization targets at runtime, without need for collecting additional observation batches or re-training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes