ROApr 12, 2021

Risk-Averse Biased Human Policies in Assistive Multi-Armed Bandit Settings

arXiv:2104.05334v1
Originality Incremental advance
AI Analysis

This addresses human-robot team efficiency in assistive scenarios, but it is incremental as it builds on existing bandit and bias models.

The paper tackles the problem of human risk-aversion in assistive multi-armed bandit settings by expanding the model to use observable rewards, resulting in an algorithm that increases team utility by eliminating bias and enabling more rational choices.

Assistive multi-armed bandit problems can be used to model team situations between a human and an autonomous system like a domestic service robot. To account for human biases such as the risk-aversion described in the Cumulative Prospect Theory, the setting is expanded to using observable rewards. When robots leverage knowledge about the risk-averse human model they eliminate the bias and make more rational choices. We present an algorithm that increases the utility value of such human-robot teams. A brief evaluation indicates that arbitrary reward functions can be handled.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes