OCLGPRJan 24, 2013

Reinforcement learning from comparisons: Three alternatives is enough, two is not

arXiv:1301.5734v19 citations
Originality Highly original
AI Analysis

This addresses a fundamental issue in decision-making and ranking systems where intransitive preferences arise, offering a reliable method for applications like recommendation systems or social choice.

The paper tackled the problem of selecting the best alternative using pairwise comparisons that may be intransitive, and proved that a reinforcement learning model converges to the optimal solution when reinforcing the winner among three random alternatives, while a simpler two-alternative model can cycle and fail to converge.

The paper deals with the problem of finding the best alternatives on the basis of pairwise comparisons when these comparisons need not be transitive. In this setting, we study a reinforcement urn model. We prove convergence to the optimal solution when reinforcement of a winning alternative occurs each time after considering three random alternatives. The simpler process, which reinforces the winner of a random pair does not always converges: it may cycle.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes