LGMay 12

Split the Differences, Pool the Rest: Provably Efficient Multi-Objective Imitation

arXiv:2605.1200062.2
Predicted impact top 36% in LG · last 90 daysOriginality Highly original
AI Analysis

For researchers in imitation learning and multi-objective reinforcement learning, this work provides the first provably efficient algorithm with theoretical guarantees for recovering Pareto-optimal policies from multiple expert demonstrations.

The paper introduces MA-BC, an algorithm for multi-objective imitation learning that partitions conflicting expert data and pools non-conflicting data, achieving provably faster convergence to Pareto-optimal policies than independent expert learning, with minimax optimality guarantees.

This work investigates multi-objective imitation learning: the problem of recovering policies that lie on the Pareto front given demonstrations from multiple Pareto-optimal experts in a Multi-Objective Markov Decision Process (MOMDP). Standard imitation approaches are ill-equipped for this regime, as naively aggregating conflicting expert trajectories can result in dominated policies. To address this, we introduce Multi-Output Augmented Behavioral Cloning (MA-BC), an algorithm that systematically partitions divergent expert data while pooling state-action pairs where no behavior conflict is observed. Theoretically, we prove that MA-BC converges to Pareto-optimal policies at a faster statistical rate than any learner that considers each expert dataset independently. Furthermore, we establish a novel lower bound for multi-objective imitation learning, demonstrating that MA-BC is minimax optimal. Finally, we empirically validate our algorithm across diverse discrete environments and, guided by our theoretical insights, extend and evaluate MA-BC on a continuous Linear Quadratic Regulator (LQR) control task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes