AIHCMay 18

Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use

arXiv:2605.191514.2
AI Analysis

It provides a formal framework for trust calibration in human-AI interaction, addressing the problem of when to automate vs. require human approval for agentic tool use.

The paper formalizes trust calibration for agentic tool use as a preference-learning problem, using a Gaussian-process posterior over a latent human risk-tolerance function to decide when to automate or seek human approval. It shows this is an instance of Preferential Bayesian Optimization, inheriting its inference and sample-efficiency properties.

We formalize trust calibration for agentic tool use (deciding when an automated agent's proposed action may execute autonomously versus require human approval) as a preference-learning problem. A policy gateway maintains a Gaussian-process posterior over a latent human risk-tolerance function, observed through a probit likelihood on binary approve/deny feedback, and escalates to the human exactly where the approval outcome is most uncertain. We show this is structurally an instance of Preferential Bayesian Optimization, inheriting its inference machinery (approximate Gaussian-process classification) and its sample-efficiency argument (uncertainty-targeted querying), while differing in objective: classifying an action space into allow/block/ask regions rather than optimizing a design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes