LGMLJul 13, 2021

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

arXiv:2107.06317v315 citations
Originality Incremental advance
AI Analysis

This addresses the need for transparency in evolving decision processes like medical practice, though it is incremental as it builds on existing contextual bandit frameworks.

The authors tackled the problem of learning non-stationary decision-making policies from observed behavior, such as in healthcare, by formalizing Inverse Contextual Bandits and proposing algorithms that provide interpretable representations, demonstrating applicability and accuracy on liver transplantation data.

Understanding a decision-maker's priorities by observing their behavior is critical for transparency and accountability in decision processes, such as in healthcare. Though conventional approaches to policy learning almost invariably assume stationarity in behavior, this is hardly true in practice: Medical practice is constantly evolving as clinical professionals fine-tune their knowledge over time. For instance, as the medical community's understanding of organ transplantations has progressed over the years, a pertinent question is: How have actual organ allocation policies been evolving? To give an answer, we desire a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent's non-stationary knowledge of the world, as well as operating in an offline manner. First, we model the evolving behavior of decision-makers in terms of contextual bandits, and formalize the problem of Inverse Contextual Bandits (ICB). Second, we propose two concrete algorithms as solutions, learning parametric and nonparametric representations of an agent's behavior. Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating its accuracy.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes