LGAINEOct 10, 2023

Diversity from Human Feedback

arXiv:2310.06648v35 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses the challenge of incorporating human preferences into diversity measures for problems like ensemble learning and optimization, though it is incremental as it builds on existing Quality-Diversity methods.

The paper tackles the problem of defining diversity measures by proposing DivHF, which learns a behavior space from human feedback, and shows that it yields solutions more consistent with human preferences and increases diversity compared to data-driven approaches without feedback.

Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that the behavior learned by DivHF is much more consistent with human requirements than the one learned by direct data-driven approaches without human feedback, and makes the final solutions more diverse under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes