Situational Awareness by Risk-Conscious Skills
This work addresses risk-aware decision-making in hierarchical RL for domains like robotics or games, representing an incremental advancement by integrating risk sensitivity into an existing framework.
The paper tackles the problem of incorporating risk sensitivity into hierarchical reinforcement learning by defining and learning risk-aware skills in a Probabilistic Goal Semi-Markov Decision Process, resulting in the SARiCoS algorithm with theoretical convergence guarantees and demonstration of complex human-like behaviors such as 'time-wasting' in a RoboCup soccer domain.
Hierarchical Reinforcement Learning has been previously shown to speed up the convergence rate of RL planning algorithms as well as mitigate feature-based model misspecification (Mankowitz et. al. 2016a,b, Bacon 2015). To do so, it utilizes hierarchical abstractions, also known as skills -- a type of temporally extended action (Sutton et. al. 1999) to plan at a higher level, abstracting away from the lower-level details. We incorporate risk sensitivity, also referred to as Situational Awareness (SA), into hierarchical RL for the first time by defining and learning risk aware skills in a Probabilistic Goal Semi-Markov Decision Process (PG-SMDP). This is achieved using our novel Situational Awareness by Risk-Conscious Skills (SARiCoS) algorithm which comes with a theoretical convergence guarantee. We show in a RoboCup soccer domain that the learned risk aware skills exhibit complex human behaviors such as `time-wasting' in a soccer game. In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.