LGMar 15, 2024
Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features CriticsLuca Grillotti, Maxence Faldor, Borja G. León et al.
A key aspect of intelligence is the ability to demonstrate a broad spectrum of behaviors for adapting to unexpected situations. Over the past decade, advancements in deep reinforcement learning have led to groundbreaking achievements to solve complex continuous control tasks. However, most approaches return only one solution specialized for a specific problem. We introduce Quality-Diversity Actor-Critic (QDAC), an off-policy actor-critic deep reinforcement learning algorithm that leverages a value function critic and a successor features critic to learn high-performing and diverse behaviors. In this framework, the actor optimizes an objective that seamlessly unifies both critics using constrained optimization to (1) maximize return, while (2) executing diverse skills. Compared with other Quality-Diversity methods, QDAC achieves significantly higher performance and more diverse behaviors on six challenging continuous control locomotion tasks. We also demonstrate that we can harness the learned skills to adapt better than other baselines to five perturbed environments. Finally, qualitative analyses showcase a range of remarkable behaviors: adaptive-intelligent-robotics.github.io/QDAC.
AIOct 18, 2021
In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal SpecificationsBorja G. León, Murray Shanahan, Francesco Belardinelli
We address the problem of building agents whose goal is to learn to execute out-of distribution (OOD) multi-task instructions expressed in temporal logic (TL) by using deep reinforcement learning (DRL). Recent works provided evidence that the agent's neural architecture is a key feature when DRL agents are learning to solve OOD tasks in TL. Yet, the studies on this topic are still in their infancy. In this work, we propose a new deep learning configuration with inductive biases that lead agents to generate latent representations of their current goal, yielding a stronger generalization performance. We use these latent-goal networks within a neuro-symbolic framework that executes multi-task formally-defined instructions and contrast the performance of the proposed neural networks against employing different state-of-the-art (SOTA) architectures when generalizing to unseen instructions in OOD environments.
MAFeb 2, 2021
An Abstraction-based Method to Check Multi-Agent Deep Reinforcement-Learning BehaviorsPierre El Mqirmi, Francesco Belardinelli, Borja G. León
Multi-agent reinforcement learning (RL) often struggles to ensure the safe behaviours of the learning agents, and therefore it is generally not adapted to safety-critical applications. To address this issue, we present a methodology that combines formal verification with (deep) RL algorithms to guarantee the satisfaction of formally-specified safety constraints both in training and testing. The approach we propose expresses the constraints to verify in Probabilistic Computation Tree Logic (PCTL) and builds an abstract representation of the system to reduce the complexity of the verification step. This abstract model allows for model checking techniques to identify a set of abstract policies that meet the safety constraints expressed in PCTL. Then, the agents' behaviours are restricted according to these safe abstract policies. We provide formal guarantees that by using this method, the actions of the agents always meet the safety constraints, and provide a procedure to generate an abstract model automatically. We empirically evaluate and show the effectiveness of our method in a multi-agent environment.
LGJun 12, 2020
Systematic Generalisation through Task Temporal Logic and Deep Reinforcement LearningBorja G. León, Murray Shanahan, Francesco Belardinelli
This work introduces a neuro-symbolic agent that combines deep reinforcement learning (DRL) with temporal logic (TL) to achieve systematic zero-shot, i.e., never-seen-before, generalisation of formally specified instructions. In particular, we present a neuro-symbolic framework where a symbolic module transforms TL specifications into a form that helps the training of a DRL agent targeting generalisation, while a neural module learns systematically to solve the given tasks. We study the emergence of systematic learning in different settings and find that the architecture of the convolutional layers is key when generalising to new instructions. We also provide evidence that systematic learning can emerge with abstract operators such as negation when learning from a few training examples, which previous research have struggled with.
AIFeb 14, 2020
Extended Markov Games to Learn Multiple Tasks in Multi-Agent Reinforcement LearningBorja G. León, Francesco Belardinelli
The combination of Formal Methods with Reinforcement Learning (RL) has recently attracted interest as a way for single-agent RL to learn multiple-task specifications. In this paper we extend this convergence to multi-agent settings and formally define Extended Markov Games as a general mathematical model that allows multiple RL agents to concurrently learn various non-Markovian specifications. To introduce this new model we provide formal definitions and proofs as well as empirical tests of RL algorithms running on this framework. Specifically, we use our model to train two different logic-based multi-agent RL algorithms to solve diverse settings of non-Markovian co-safe LTL specifications.