LGMESep 9, 2022

Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

arXiv:2209.04356v12 citationsh-index: 40
Originality Incremental advance
AI Analysis

This addresses a risk-sensitive decision-making challenge in mobile health applications like emotion regulation, but it appears incremental as it builds on existing bandit frameworks with a focus on unobserved confounders.

The paper tackles the problem of risk-averse multi-armed bandits with unobserved confounders, aiming to learn a policy that minimizes risk rather than maximizing expected return, and demonstrates a method to identify the minimum-risk arm with fewer online steps while avoiding bias from expert data.

In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presence of contexts that are only observable by the expert but not by the learner. Thus, such contexts are unobserved confounders (UCs) from the learner's perspective. Given a dataset generated by the expert that excludes the UCs, the goal for the learner is to identify the true minimum-risk arm with fewer online learning steps, while avoiding possible biased decisions due to the presence of UCs in the expert's data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes