A Q-learning Approach for Adherence-Aware Recommendations
This addresses the challenge of improving AI-human collaboration in safety-critical domains, though it appears incremental as it adapts existing Q-learning methods.
The paper tackled the problem of AI recommendations to human decision-makers in high-stakes scenarios by developing an adherence-aware Q-learning algorithm that learns how often humans follow recommendations and derives optimal policies, proving convergence and evaluating performance across scenarios.
In many real-world scenarios involving high-stakes and safety implications, a human decision-maker (HDM) may receive recommendations from an artificial intelligence while holding the ultimate responsibility of making decisions. In this letter, we develop an "adherence-aware Q-learning" algorithm to address this problem. The algorithm learns the "adherence level" that captures the frequency with which an HDM follows the recommended actions and derives the best recommendation policy in real time. We prove the convergence of the proposed Q-learning algorithm to the optimal value and evaluate its performance across various scenarios.