ROOct 3, 2023Code
Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiencyFrancisco Roldan Sanchez, Qiang Wang, David Cordova Bulens et al.
Hindsight Experience Replay (HER) is a technique used in reinforcement learning (RL) that has proven to be very efficient for training off-policy RL-based agents to solve goal-based robotic manipulation tasks using sparse rewards. Even though HER improves the sample efficiency of RL-based agents by learning from mistakes made in past experiences, it does not provide any guidance while exploring the environment. This leads to very large training times due to the volume of experience required to train an agent using this replay strategy. In this paper, we propose a method that uses primitive behaviours that have been previously learned to solve simple tasks in order to guide the agent toward more rewarding actions during exploration while learning other more complex tasks. This guidance, however, is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed by the previously-learned primitive policies. We evaluate our method by comparing its performance against HER and other more efficient variations of this algorithm in several block manipulation tasks. We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time. Code is available at https://github.com/franroldans/qmp-her.
25.3LGMar 13
A Spectral Revisit of the Distributional Bellman Operator under the Cramér MetricKeru Wang, Yixin Deng, Yao Lyu et al.
Distributional reinforcement learning (DRL) studies the evolution of full return distributions under Bellman updates rather than focusing on expected values. A classical result is that the distributional Bellman operator is contractive under the Cramér metric, which corresponds to an $L^2$ geometry on differences of cumulative distribution functions (CDFs). While this contraction ensures stability of policy evaluation, existing analyses remain largely metric, focusing on contraction properties without elucidating the structural action of the Bellman update on distributions. In this work, we analyse distributional Bellman dynamics directly at the level of CDFs, treating the Cramér geometry as the intrinsic analytical setting. At this level, the Bellman update acts affinely on CDFs and linearly on differences between CDFs, and its contraction property yields a uniform bound on this linear action. Building on this intrinsic formulation, we construct a family of regularised spectral Hilbert representations that realise the CDF-level geometry by exact conjugation, without modifying the underlying Bellman dynamics. The regularisation affects only the geometry and vanishes in the zero-regularisation limit, recovering the native Cramér metric. This framework clarifies the operator structure underlying distributional Bellman updates and provides a foundation for further functional and operator-theoretic analyses in DRL.
HCAug 1, 2019
Visual cues in estimation of part-to-whole comparisonStephen Redmond
Pie charts were first published in 1801 by William Playfair and have caused some controversy since. Despite the suggestions of many experts against their use, several empirical studies have shown that pie charts are at least as good as alternatives. From Brinton to Few on one side and Eells to Kosara on the other, there appears to have been a hundred-year war waged on the humble pie. In this paper a set of experiments are reported that compare the performance of pie charts and horizontal bar charts with various visual cues. Amazon's Mechanical Turk service was employed to perform the tasks of estimating segments in various part-to-whole charts. The results lead to recommendations for data visualization professionals in developing dashboards.