HCAIApr 6, 2025

"Trust me on this" Explaining Agent Behavior to a Human Terminator

arXiv:2504.04592v2h-index: 18
Originality Synthesis-oriented
AI Analysis

This addresses the trade-off in human-machine interactions, such as in autonomous driving and healthcare, to enhance agent usefulness and safety, though it appears incremental as it builds on existing explainability concepts.

The paper tackles the problem of optimizing human interventions in human-machine interaction by formalizing a setting where a pre-trained agent operates with potential human take-overs, and proposes an explainability scheme to balance between sub-optimal agent policies and excessive human interventions.

Consider a setting where a pre-trained agent is operating in an environment and a human operator can decide to temporarily terminate its operation and take-over for some duration of time. These kind of scenarios are common in human-machine interactions, for example in autonomous driving, factory automation and healthcare. In these settings, we typically observe a trade-off between two extreme cases -- if no take-overs are allowed, then the agent might employ a sub-optimal, possibly dangerous policy. Alternatively, if there are too many take-overs, then the human has no confidence in the agent, greatly limiting its usefulness. In this paper, we formalize this setup and propose an explainability scheme to help optimize the number of human interventions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes