ROJun 28, 2023
MRHER: Model-based Relay Hindsight Experience Replay for Sequential Object Manipulation Tasks with Sparse RewardsYuming Huang, Bin Ren, Ziming Xu et al.
Sparse rewards pose a significant challenge to achieving high sample efficiency in goal-conditioned reinforcement learning (RL). Specifically, in sequential manipulation tasks, the agent receives failure rewards until it successfully completes the entire manipulation task, which leads to low sample efficiency. To tackle this issue and improve sample efficiency, we propose a novel model-based RL framework called Model-based Relay Hindsight Experience Replay (MRHER). MRHER breaks down a continuous task into subtasks with increasing complexity and utilizes the previous subtask to guide the learning of the subsequent one. Instead of using Hindsight Experience Replay (HER) in every subtask, we design a new robust model-based relabeling method called Foresight relabeling (FR). FR predicts the future trajectory of the hindsight state and relabels the expected goal as a goal achieved on the virtual future trajectory. By incorporating FR, MRHER effectively captures more information from historical experiences, leading to improved sample efficiency, particularly in object-manipulation environments. Experimental results demonstrate that MRHER exhibits state-of-the-art sample efficiency in benchmark tasks, outperforming RHER by 13.79% and 14.29% in the FetchPush-v1 environment and FetchPickandPlace-v1 environment, respectively.
SYApr 25, 2019
Tracking Performance Limitations of MIMO Networked Control Systems with Multiple Communication ConstraintsChao-Yang Chen, Weihua Gui, Lianghong Wu et al.
In this paper, the tracking performance limitation of networked control systems (NCSs) is studied. The NCSs is considered as continuous-time linear multi-input multi-output (MIMO) systems with random reference noises. The controlled plants include unstable poles and non-minimum phase (NMP) zeros. The output feedback path is affected by multiple communication constraints. We focus on some basic communication constraints, including additive white noise (AWN), quantization noise, bandwidth, as well as encoder-decoder. The system performance is evaluated with the tracking error energy, and used a two-degree of freedom (2DOF) controller. The explicit representation of the tracking performance is given in this paper. The results indicate the tracking performance limitations rely to internal characteristics of the plant (unstable poles and NMP zeros), reference noises (the reference noise power distribution (RNPD) and its directions) and the characteristics of communication constraints. Moreover, the tracking performance limitations are also affected by the angles between the each transform NMP zero direction and RNPD direction, and these angles between each transform unstable poles direction and the direction of communication constraint distribution/allocation. In addition, for MIMO NCSs, bandwidth (there are not identical two channels) always can affects the direction of unstable poles, and the channel allocation of bandwidth and encode-decode may be used for a feasible method for the performance allocation of each channels. Lastly, a instance is given for verifying the effectiveness of the theoretical outcomes.