LGAIMLSep 16, 2020

Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback

arXiv:2009.07518v13 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of scarce user feedback in online learning for digitizing businesses, though it appears incremental as it builds on existing bandit methods.

The paper tackles the problem of requiring excessive explicit user feedback in combinatorial multi-armed bandit algorithms for recommender systems by proposing a novel approach that reduces feedback needs while maintaining global accuracy and learning efficiency, achieving similar results with as low as 20% of the total feedback compared to state-of-the-art methods.

Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed Bandits (COM-MAB) show good results on a global accuracy metric. This can be achieved, in the case of recommender systems, with personalization. However, with a combinatorial online learning approach, personalization implies a large amount of user feedbacks. Such feedbacks can be hard to acquire when users need to be directly and frequently solicited. For a number of fields of activities undergoing the digitization of their business, online learning is unavoidable. Thus, a number of approaches allowing implicit user feedback retrieval have been implemented. Nevertheless, this implicit feedback can be misleading or inefficient for the agent's learning. Herein, we propose a novel approach reducing the number of explicit feedbacks required by Combinatorial Multi Armed bandit (COM-MAB) algorithms while providing similar levels of global accuracy and learning efficiency to classical competitive methods. In this paper we present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes