LGAIIRAug 26, 2024

Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications

arXiv:2408.14432v22 citationsh-index: 2
AI Analysis

It addresses feedback bias in recommendation systems, which is an incremental improvement for online decision-making.

The paper tackles the problem of herding effects biasing user feedback in contextual bandit algorithms for recommendations, resulting in the TS-Conf algorithm that outperforms benchmarks and improves learning speed and accuracy.

Contextual bandits serve as a fundamental algorithmic framework for optimizing recommendation decisions online. Though extensive attention has been paid to tailoring contextual bandits for recommendation applications, the "herding effects" in user feedback have been ignored. These herding effects bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits. This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects. A user feedback model is formulated to capture this feedback bias. We design the TS-Conf (Thompson Sampling under Conformity) algorithm, which employs posterior sampling to balance the exploration and exploitation tradeoff. We prove an upper bound for the regret of the algorithm, revealing the impact of herding effects on learning speed. Extensive experiments on datasets demonstrate that TS-Conf outperforms four benchmark algorithms. Analysis reveals that TS-Conf effectively mitigates the negative impact of herding effects, resulting in faster learning and improved recommendation accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes