Designing an Interpretable Interface for Contextual Bandits
This addresses the problem of interpretability for non-expert operators in personalized recommender systems, though it is incremental as it builds on existing off-policy evaluation methods.
The paper tackled the interpretability challenge in contextual bandit systems by designing a new interface with a 'value gain' metric to explain behavior to non-expert operators, and a user study found it effective in empowering them to manage these systems.
Contextual bandits have become an increasingly popular solution for personalized recommender systems. Despite their growing use, the interpretability of these systems remains a significant challenge, particularly for the often non-expert operators tasked with ensuring their optimal performance. In this paper, we address this challenge by designing a new interface to explain to domain experts the underlying behaviour of a bandit. Central is a metric we term "value gain", a measure derived from off-policy evaluation to quantify the real-world impact of sub-components within a bandit. We conduct a qualitative user study to evaluate the effectiveness of our interface. Our findings suggest that by carefully balancing technical rigour with accessible presentation, it is possible to empower non-experts to manage complex machine learning systems. We conclude by outlining guiding principles that other researchers should consider when building similar such interfaces in future.