AINov 19, 2024

Preference-Conditioned Gradient Variations for Multi-Objective Quality-Diversity

Hannah Janmohamed, Maxence Faldor, Thomas Pierrot, Antoine Cully

arXiv:2411.12433v14.22 citationsh-index: 7ACM Trans Evol Learn Optim

Originality Incremental advance

AI Analysis

This work addresses a problem for robotics and other domains requiring diverse, high-performing solutions in multi-objective settings, representing an incremental improvement over prior methods.

The paper tackles the limitation of existing Multi-Objective Quality-Diversity algorithms, which struggle with search efficiency in high-dimensional spaces, by introducing a new algorithm that uses preference-conditioned policy-gradient mutations and crowding mechanisms, resulting in outperforming or matching state-of-the-art methods on six robotics locomotion tasks with smoother trade-offs and lower computational storage cost.

In a variety of domains, from robotics to finance, Quality-Diversity algorithms have been used to generate collections of both diverse and high-performing solutions. Multi-Objective Quality-Diversity algorithms have emerged as a promising approach for applying these methods to complex, multi-objective problems. However, existing methods are limited by their search capabilities. For example, Multi-Objective Map-Elites depends on random genetic variations which struggle in high-dimensional search spaces. Despite efforts to enhance search efficiency with gradient-based mutation operators, existing approaches consider updating solutions to improve on each objective separately rather than achieving desired trade-offs. In this work, we address this limitation by introducing Multi-Objective Map-Elites with Preference-Conditioned Policy-Gradient and Crowding Mechanisms: a new Multi-Objective Quality-Diversity algorithm that uses preference-conditioned policy-gradient mutations to efficiently discover promising regions of the objective space and crowding mechanisms to promote a uniform distribution of solutions on the Pareto front. We evaluate our approach on six robotics locomotion tasks and show that our method outperforms or matches all state-of-the-art Multi-Objective Quality-Diversity methods in all six, including two newly proposed tri-objective tasks. Importantly, our method also achieves a smoother set of trade-offs, as measured by newly-proposed sparsity-based metrics. This performance comes at a lower computational storage cost compared to previous methods.

View on arXiv PDF

Similar