AINov 20, 2017

Situationally Aware Options

arXiv:1711.07832v11 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more adaptive and reusable hierarchical abstractions in reinforcement learning, particularly for domains like robotics or games, though it appears incremental as it builds on existing option frameworks.

The paper tackled the problem of enabling reinforcement learning agents to adapt their behavior based on the current situation by learning reusable options with varying parameters, such as vigor, in hierarchical RL. The result showed that these situationally aware options led to human-like behaviors like 'time-wasting' in a RoboCup soccer domain and helped mitigate model misspecification in a Bottomless Pit of Death domain.

Hierarchical abstractions, also known as options -- a type of temporally extended action (Sutton et. al. 1999) that enables a reinforcement learning agent to plan at a higher level, abstracting away from the lower-level details. In this work, we learn reusable options whose parameters can vary, encouraging different behaviors, based on the current situation. In principle, these behaviors can include vigor, defence or even risk-averseness. These are some examples of what we refer to in the broader context as Situational Awareness (SA). We incorporate SA, in the form of vigor, into hierarchical RL by defining and learning situationally aware options in a Probabilistic Goal Semi-Markov Decision Process (PG-SMDP). This is achieved using our Situationally Aware oPtions (SAP) policy gradient algorithm which comes with a theoretical convergence guarantee. We learn reusable options in different scenarios in a RoboCup soccer domain (i.e., winning/losing). These options learn to execute with different levels of vigor resulting in human-like behaviours such as `time-wasting' in the winning scenario. We show the potential of the agent to exit bad local optima using reusable options in RoboCup. Finally, using SAP, the agent mitigates feature-based model misspecification in a Bottomless Pit of Death domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes