Efficient Controller Learning from Human Preferences and Numerical Data Via Multi-Modal Surrogate Models

Lukas Theiner, Maik Pfefferkorn, Yongpeng Zhao, Sebastian Hirt, Rolf Findeisen

arXiv:2603.2413817.7h-index: 7

Predicted impact top 76% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the challenge of automating control policy tuning for systems involving human preferences, offering an incremental improvement over existing preferential Bayesian optimization methods.

The paper tackles the problem of tuning control policies for systems with subjective criteria by proposing a multi-fidelity, multi-modal Bayesian optimization framework that integrates low-fidelity numerical data with high-fidelity human preferences, showing that this combination significantly reduces the need for human-involved experiments while effectively adapting to individual preferences.

Tuning control policies manually to meet high-level objectives is often time-consuming. Bayesian optimization provides a data-efficient framework for automating this process using numerical evaluations of an objective function. However, many systems, particularly those involving humans, require optimization based on subjective criteria. Preferential Bayesian optimization addresses this by learning from pairwise comparisons instead of quantitative measurements, but relying solely on preference data can be inefficient. We propose a multi-fidelity, multi-modal Bayesian optimization framework that integrates low-fidelity numerical data with high-fidelity human preferences. Our approach employs Gaussian process surrogate models with both hierarchical, autoregressive and non-hierarchical, coregionalization-based structures, enabling efficient learning from mixed-modality data. We illustrate the framework by tuning an autonomous vehicle's trajectory planner, showing that combining numerical and preference data significantly reduces the need for experiments involving the human decision maker while effectively adapting driving style to individual preferences.

View on arXiv PDF

Similar