Konstantin Avrachenkov, Vivek S. Borkar, Sharayu Moharir et al.
We introduce a model of graph-constrained dynamic choice with reinforcement modeled by positively $α$-homogeneous rewards. We show that its empirical process, which can be written as a stochastic approximation recursion with Markov noise, has the same probability law as a certain vertex reinforced random walk. We use this equivalence to show that for $α> 0$, the asymptotic outcome concentrates around the optimum in a certain limiting sense when `annealed' by letting $α\uparrow\infty$ slowly.