LGGTIRFeb 12, 2023

Online Recommendations for Agents with Discounted Adaptive Preferences

arXiv:2302.06014v26 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses the challenge of making recommendations to agents with adaptive and memory-dependent preferences, which is incremental by extending prior settings to include non-uniform memory and discount factors.

The paper tackles the problem of online recommendations for agents with evolving preferences, where preferences change based on past selections with discounted memory, and aims to minimize regret against target sets for adversarial losses. It shows that efficient sublinear regret is achievable in long-term memory regimes for smooth preference models and scale-bounded preferences, but faces NP-hardness and information-theoretic barriers in certain cases.

We consider a bandit recommendations problem in which an agent's preferences (representing selection probabilities over recommended items) evolve as a function of past selections, according to an unknown $\textit{preference model}$. In each round, we show a menu of $k$ items (out of $n$ total) to the agent, who then chooses a single item, and we aim to minimize regret with respect to some $\textit{target set}$ (a subset of the item simplex) for adversarial losses over the agent's choices. Extending the setting from Agarwal and Brown (2022), where uniform-memory agents were considered, here we allow for non-uniform memory in which a discount factor is applied to the agent's memory vector at each subsequent round. In the "long-term memory" regime (when the effective memory horizon scales with $T$ sublinearly), we show that efficient sublinear regret is obtainable with respect to the set of $\textit{everywhere instantaneously realizable distributions}$ (the "EIRD set", as formulated in prior work) for any $\textit{smooth}$ preference model. Further, for preferences which are bounded above and below by linear functions of memory weight (we call these "scale-bounded" preferences) we give an algorithm which obtains efficient sublinear regret with respect to nearly the $\textit{entire}$ item simplex. We show an NP-hardness result for expanding to targets beyond EIRD in general. In the "short-term memory" regime (when the memory horizon is constant), we show that scale-bounded preferences again enable efficient sublinear regret for nearly the entire simplex even without smoothness if losses do not change too frequently, yet we show an information-theoretic barrier for competing against the EIRD set under arbitrary smooth preference models even when losses are constant.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes