LGAug 22, 2024

Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation

arXiv:2408.12110v14.61 citationsh-index: 6

Originality Incremental advance

AI Analysis

This addresses the challenge of generating diverse expert policies for sequential decision-making problems like autonomous driving, though it is incremental as it builds on existing IRL methods.

The paper tackles the problem of learning Pareto-optimal policies from limited expert datasets with conflicting objectives, by introducing a Pareto Inverse Reinforcement Learning framework that generates diverse policies and outperforms other IRL algorithms in multi-objective control tasks.

Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL framework (ParIRL) that establishes a Pareto policy set from these limited datasets. In the framework, the Pareto policy set is then distilled into a single, preference-conditioned diffusion model, thus allowing users to immediately specify which expert's patterns they prefer. Through experiments, we show that ParIRL outperforms other IRL algorithms for various multi-objective control tasks, achieving the dense approximation of the Pareto frontier. We also demonstrate the applicability of ParIRL with autonomous driving in CARLA.

View on arXiv PDF

Similar