LGAIMLFeb 29, 2020

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

arXiv:2003.00359v143 citations
AI Analysis

This addresses the challenge of adapting to changing user preferences in recommender systems, representing an incremental improvement with specific algorithmic extensions.

The paper tackles the problem of personalized recommendation in non-stationary environments with time-varying user interests by proposing contextual bandit models with disjoint and hybrid payoffs, achieving sublinear regret scaling in time T and showing empirical advantages over baselines on real-world datasets.

A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users' preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm is determined by an arm-specific preference vector, which is piecewise-stationary with asynchronous and distinct changes across different arms. An efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length $T$ is achieved. The algorithm is further extended to a more general setting with hybrid payoffs where the reward of playing an arm is determined by both an arm-specific preference vector and a joint coefficient vector shared by all arms. Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes