AIAug 7, 2023Code
QDax: A Library for Quality-Diversity and Population-based Algorithms with Hardware AccelerationFelix Chalumeau, Bryan Lim, Raphael Boige et al. · ibm-research
QDax is an open-source library with a streamlined and modular API for Quality-Diversity (QD) optimization algorithms in Jax. The library serves as a versatile tool for optimization purposes, ranging from black-box optimization to continuous control. QDax offers implementations of popular QD, Neuroevolution, and Reinforcement Learning (RL) algorithms, supported by various examples. All the implementations can be just-in-time compiled with Jax, facilitating efficient execution across multiple accelerators, including GPUs and TPUs. These implementations effectively demonstrate the framework's flexibility and user-friendliness, easing experimentation for research purposes. Furthermore, the library is thoroughly documented and tested with 95\% coverage.
NENov 4, 2022Code
Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement LearningManon Flageat, Bryan Lim, Luca Grillotti et al. · ibm-research
We present a Quality-Diversity benchmark suite for Deep Neuroevolution in Reinforcement Learning domains for robot control. The suite includes the definition of tasks, environments, behavioral descriptors, and fitness. We specify different benchmarks based on the complexity of both the task and the agent controlled by a deep neural network. The benchmark uses standard Quality-Diversity metrics, including coverage, QD-score, maximum fitness, and an archive profile metric to quantify the relation between coverage and fitness. We also present how to quantify the robustness of the solutions with respect to environmental stochasticity by introducing corrected versions of the same metrics. We believe that our benchmark is a valuable tool for the community to compare and improve their findings. The source code is available online: https://github.com/adaptive-intelligent-robotics/QDax
LGNov 3, 2023
Mix-ME: Quality-Diversity for Multi-Agent LearningGarðar Ingvarsson, Mikayel Samvelyan, Bryan Lim et al. · deepmind
In many real-world systems, such as adaptive robotics, achieving a single, optimised solution may be insufficient. Instead, a diverse set of high-performing solutions is often required to adapt to varying contexts and requirements. This is the realm of Quality-Diversity (QD), which aims to discover a collection of high-performing solutions, each with their own unique characteristics. QD methods have recently seen success in many domains, including robotics, where they have been used to discover damage-adaptive locomotion controllers. However, most existing work has focused on single-agent settings, despite many tasks of interest being multi-agent. To this end, we introduce Mix-ME, a novel multi-agent variant of the popular MAP-Elites algorithm that forms new solutions using a crossover-like operator by mixing together agents from different teams. We evaluate the proposed methods on a variety of partially observable continuous control tasks. Our evaluation shows that these multi-agent variants obtained by Mix-ME not only compete with single-agent baselines but also often outperform them in multi-agent settings under partial observability.
LGMar 10, 2023
Understanding the Synergies between Quality-Diversity and Deep Reinforcement LearningBryan Lim, Manon Flageat, Antoine Cully
The synergies between Quality-Diversity (QD) and Deep Reinforcement Learning (RL) have led to powerful hybrid QD-RL algorithms that have shown tremendous potential, and brings the best of both fields. However, only a single deep RL algorithm (TD3) has been used in prior hybrid methods despite notable progress made by other RL algorithms. Additionally, there are fundamental differences in the optimization procedures between QD and RL which would benefit from a more principled approach. We propose Generalized Actor-Critic QD-RL, a unified modular framework for actor-critic deep RL methods in the QD-RL setting. This framework provides a path to study insights from Deep RL in the QD-RL setting, which is an important and efficient way to make progress in QD-RL. We introduce two new algorithms, PGA-ME (SAC) and PGA-ME (DroQ) which apply recent advancements in Deep RL to the QD-RL setting, and solves the humanoid environment which was not possible using existing QD-RL algorithms. However, we also find that not all insights from Deep RL can be effectively translated to QD-RL. Critically, this work also demonstrates that the actor-critic models in QD-RL are generally insufficiently trained and performance gains can be achieved without any additional environment evaluations.
NEApr 7, 2023
Don't Bet on Luck Alone: Enhancing Behavioral Reproducibility of Quality-Diversity Solutions in Uncertain DomainsLuca Grillotti, Manon Flageat, Bryan Lim et al.
Quality-Diversity (QD) algorithms are designed to generate collections of high-performing solutions while maximizing their diversity in a given descriptor space. However, in the presence of unpredictable noise, the fitness and descriptor of the same solution can differ significantly from one evaluation to another, leading to uncertainty in the estimation of such values. Given the elitist nature of QD algorithms, they commonly end up with many degenerate solutions in such noisy settings. In this work, we introduce Archive Reproducibility Improvement Algorithm (ARIA); a plug-and-play approach that improves the reproducibility of the solutions present in an archive. We propose it as a separate optimization module, relying on natural evolution strategies, that can be executed on top of any QD algorithm. Our module mutates solutions to (1) optimize their probability of belonging to their niche, and (2) maximize their fitness. The performance of our method is evaluated on various tasks, including a classical optimization problem and two high-dimensional control tasks in simulated robotic environments. We show that our algorithm enhances the quality and descriptor space coverage of any given archive by at least 50%.
NENov 22, 2022
Efficient Exploration using Model-Based Quality-Diversity with GradientsBryan Lim, Manon Flageat, Antoine Cully
Exploration is a key challenge in Reinforcement Learning, especially in long-horizon, deceptive and sparse-reward environments. For such applications, population-based approaches have proven effective. Methods such as Quality-Diversity deals with this by encouraging novel solutions and producing a diversity of behaviours. However, these methods are driven by either undirected sampling (i.e. mutations) or use approximated gradients (i.e. Evolution Strategies) in the parameter space, which makes them highly sample-inefficient. In this paper, we propose a model-based Quality-Diversity approach. It extends existing QD methods to use gradients for efficient exploitation and leverage perturbations in imagination for efficient exploration. Our approach optimizes all members of a population simultaneously to maintain both performance and diversity efficiently by leveraging the effectiveness of QD algorithms as good data generators to train deep models. We demonstrate that it maintains the divergent search capabilities of population-based approaches on tasks with deceptive rewards while significantly improving their sample efficiency and quality of solutions.
NEMar 10, 2023
Enhancing MAP-Elites with Multiple Parallel Evolution StrategiesManon Flageat, Bryan Lim, Antoine Cully
With the development of fast and massively parallel evaluations in many domains, Quality-Diversity (QD) algorithms, that already proved promising in a large range of applications, have seen their potential multiplied. However, we have yet to understand how to best use a large number of evaluations as using them for random variations alone is not always effective. High-dimensional search spaces are a typical situation where random variations struggle to effectively search. Another situation is uncertain settings where solutions can appear better than they truly are and naively evaluating more solutions might mislead QD algorithms. In this work, we propose MAP-Elites-Multi-ES (MEMES), a novel QD algorithm based on Evolution Strategies (ES) designed to exploit fast parallel evaluations more effectively. MEMES maintains multiple (up to 100) simultaneous ES processes, each with its own independent objective and reset mechanism designed for QD optimisation, all on just a single GPU. We show that MEMES outperforms both gradient-based and mutation-based QD algorithms on black-box optimisation and QD-Reinforcement-Learning tasks, demonstrating its benefit across domains. Additionally, our approach outperforms sampling-based QD methods in uncertain domains when given the same evaluation budget. Overall, MEMES generates reproducible solutions that are high-performing and diverse through large-scale ES optimisation on easily accessible hardware.
RODec 17, 2025
Remotely Detectable Robot Policy WatermarkingMichael Amir, Manon Flageat, Amanda Prorok
The success of machine learning for real-world robotic systems has created a new form of intellectual property: the trained policy. This raises a critical need for novel methods that verify ownership and detect unauthorized, possibly unsafe misuse. While watermarking is established in other domains, physical policies present a unique challenge: remote detection. Existing methods assume access to the robot's internal state, but auditors are often limited to external observations (e.g., video footage). This ``Physical Observation Gap'' means the watermark must be detected from signals that are noisy, asynchronous, and filtered by unknown system dynamics. We formalize this challenge using the concept of a \textit{glimpse sequence}, and introduce Colored Noise Coherency (CoNoCo), the first watermarking strategy designed for remote detection. CoNoCo embeds a spectral signal into the robot's motions by leveraging the policy's inherent stochasticity. To show it does not degrade performance, we prove CoNoCo preserves the marginal action distribution. Our experiments demonstrate strong, robust detection across various remote modalities, including motion capture and side-way/top-down video footage, in both simulated and real-world robot experiments. This work provides a necessary step toward protecting intellectual property in robotics, offering the first method for validating the provenance of physical policies non-invasively, using purely remote observations.
77.9MAMay 12
Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement LearningHannes Büchi, Manon Flageat, Eduardo Sebastián et al.
Effective multi-agent cooperation requires agents to adopt diverse behaviors as task conditions evolve-and to do so at the right moment. Yet, current Multi-Agent Reinforcement Learning (MARL) frameworks that facilitate this diversity are still limited by the fact that they bind fixed behaviors to fixed agent identities. Consequently, they are ill-equipped for tasks where agents need to take on different roles at very specific moments in time. We argue that, to define these behavioral transitions, the missing ingredient is events. Events are changes in the state of the system that induce qualitative changes in the task. Based on this view, we introduce a framework that decouples agent identity from behavior, capturing a continuous manifold from which agents instantiate their behaviors in response to events. This framework is based on two elements. First, to build an expressive behavior manifold, we introduce Neural Manifold Diversity (NMD), a formal distance metric that remains well-defined when behaviors are transient and agent-agnostic. Second, we use an event-based hypernetwork that generates Low-Rank Adaptation (LoRA) modules over a shared team policy, enabling on-the-fly agent-policy reconfiguration in response to events. We prove that this construction ensures that diversity does not interfere with reward maximization by design. Empirical results demonstrate that our framework outperforms established baselines across benchmarks while exhibiting zero-shot generalization, and being the only method that solves tasks requiring sequential behavior reassignment.
NEApr 24, 2024
Large Language Models as In-context AI Generators for Quality-DiversityBryan Lim, Manon Flageat, Antoine Cully
Quality-Diversity (QD) approaches are a promising direction to develop open-ended processes as they can discover archives of high-quality solutions across diverse niches. While already successful in many applications, QD approaches usually rely on combining only one or two solutions to generate new candidate solutions. As observed in open-ended processes such as technological evolution, wisely combining large diversity of these solutions could lead to more innovative solutions and potentially boost the productivity of QD search. In this work, we propose to exploit the pattern-matching capabilities of generative models to enable such efficient solution combinations. We introduce In-context QD, a framework of techniques that aim to elicit the in-context capabilities of pre-trained Large Language Models (LLMs) to generate interesting solutions using few-shot and many-shot prompting with quality-diverse examples from the QD archive as context. Applied to a series of common QD domains, In-context QD displays promising results compared to both QD baselines and similar strategies developed for single-objective optimization. Additionally, this result holds across multiple values of parameter sizes and archive population sizes, as well as across domains with distinct characteristics from BBO functions to policy search. Finally, we perform an extensive ablation that highlights the key prompt design considerations that encourage the generation of promising solutions for QD.
LGDec 12, 2023
Beyond Expected Return: Accounting for Policy Reproducibility when Evaluating Reinforcement Learning AlgorithmsManon Flageat, Bryan Lim, Antoine Cully
Many applications in Reinforcement Learning (RL) usually have noise or stochasticity present in the environment. Beyond their impact on learning, these uncertainties lead the exact same policy to perform differently, i.e. yield different return, from one roll-out to another. Common evaluation procedures in RL summarise the consequent return distributions using solely the expected return, which does not account for the spread of the distribution. Our work defines this spread as the policy reproducibility: the ability of a policy to obtain similar performance when rolled out many times, a crucial property in some real-world applications. We highlight that existing procedures that only use the expected return are limited on two fronts: first an infinite number of return distributions with a wide range of performance-reproducibility trade-offs can have the same expected return, limiting its effectiveness when used for comparing policies; second, the expected return metric does not leave any room for practitioners to choose the best trade-off value for considered applications. In this work, we address these limitations by recommending the use of Lower Confidence Bound, a metric taken from Bayesian optimisation that provides the user with a preference parameter to choose a desired performance-reproducibility trade-off. We also formalise and quantify policy reproducibility, and demonstrate the benefit of our metrics using extensive experiments of popular RL algorithms on common uncertain RL tasks.
NEDec 10, 2023
Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement LearningMaxence Faldor, Félix Chalumeau, Manon Flageat et al.
A hallmark of intelligence is the ability to exhibit a wide range of effective behaviors. Inspired by this principle, Quality-Diversity algorithms, such as MAP-Elites, are evolutionary methods designed to generate a set of diverse and high-fitness solutions. However, as a genetic algorithm, MAP-Elites relies on random mutations, which can become inefficient in high-dimensional search spaces, thus limiting its scalability to more complex domains, such as learning to control agents directly from high-dimensional inputs. To address this limitation, advanced methods like PGA-MAP-Elites and DCG-MAP-Elites have been developed, which combine actor-critic techniques from Reinforcement Learning with MAP-Elites, significantly enhancing the performance and efficiency of Quality-Diversity algorithms in complex, high-dimensional tasks. While these methods have successfully leveraged the trained critic to guide more effective mutations, the potential of the trained actor remains underutilized in improving both the quality and diversity of the evolved population. In this work, we introduce DCRL-MAP-Elites, an extension of DCG-MAP-Elites that utilizes the descriptor-conditioned actor as a generative model to produce diverse solutions, which are then injected into the offspring batch at each generation. Additionally, we present an empirical analysis of the fitness and descriptor reproducibility of the solutions discovered by each algorithm. Finally, we present a second empirical analysis shedding light on the synergies between the different variations operators and explaining the performance improvement from PGA-MAP-Elites to DCRL-MAP-Elites.
NEJun 25, 2020
Fast and stable MAP-Elites in noisy domains using deep gridsManon Flageat, Antoine Cully
Quality-Diversity optimisation algorithms enable the evolution of collections of both high-performing and diverse solutions. These collections offer the possibility to quickly adapt and switch from one solution to another in case it is not working as expected. It therefore finds many applications in real-world domain problems such as robotic control. However, QD algorithms, like most optimisation algorithms, are very sensitive to uncertainty on the fitness function, but also on the behavioural descriptors. Yet, such uncertainties are frequent in real-world applications. Few works have explored this issue in the specific case of QD algorithms, and inspired by the literature in Evolutionary Computation, mainly focus on using sampling to approximate the "true" value of the performances of a solution. However, sampling approaches require a high number of evaluations, which in many applications such as robotics, can quickly become impractical. In this work, we propose Deep-Grid MAP-Elites, a variant of the MAP-Elites algorithm that uses an archive of similar previously encountered solutions to approximate the performance of a solution. We compare our approach to previously explored ones on three noisy tasks: a standard optimisation task, the control of a redundant arm and a simulated Hexapod robot. The experimental results show that this simple approach is significantly more resilient to noise on the behavioural descriptors, while achieving competitive performances in terms of fitness optimisation, and being more sample-efficient than other existing approaches.