LGAug 22, 2024

Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation

arXiv:2408.12110v11 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the challenge of generating diverse expert policies for sequential decision-making problems like autonomous driving, though it is incremental as it builds on existing IRL methods.

The paper tackles the problem of learning Pareto-optimal policies from limited expert datasets with conflicting objectives, by introducing a Pareto Inverse Reinforcement Learning framework that generates diverse policies and outperforms other IRL algorithms in multi-objective control tasks.

Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL framework (ParIRL) that establishes a Pareto policy set from these limited datasets. In the framework, the Pareto policy set is then distilled into a single, preference-conditioned diffusion model, thus allowing users to immediately specify which expert's patterns they prefer. Through experiments, we show that ParIRL outperforms other IRL algorithms for various multi-objective control tasks, achieving the dense approximation of the Pareto frontier. We also demonstrate the applicability of ParIRL with autonomous driving in CARLA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes