SPLGApr 2, 2021

Blind Exploration and Exploitation of Stochastic Experts

arXiv:2104.01078v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of expert selection and opinion aggregation in unsupervised environments, such as multi-armed bandit problems, but is incremental as it builds on existing methods like UCB and Thompson sampling.

The paper tackles the problem of identifying the most reliable stochastic expert in an unsupervised setting where true state feedback is unavailable, by proposing a measure of expert competence inferred from other experts' opinions, and demonstrates the performance of various algorithms compared to supervised counterparts.

We present blind exploration and exploitation (BEE) algorithms for identifying the most reliable stochastic expert based on formulations that employ posterior sampling, upper-confidence bounds, empirical Kullback-Leibler divergence, and minmax methods for the stochastic multi-armed bandit problem. Joint sampling and consultation of experts whose opinions depend on the hidden and random state of the world becomes challenging in the unsupervised, or blind, framework as feedback from the true state is not available. We propose an empirically realizable measure of expert competence that can be inferred instantaneously using only the opinions of other experts. This measure preserves the ordering of true competences and thus enables joint sampling and consultation of stochastic experts based on their opinions on dynamically changing tasks. Statistics derived from the proposed measure is instantaneously available allowing both blind exploration-exploitation and unsupervised opinion aggregation. We discuss how the lack of supervision affects the asymptotic regret of BEE architectures that rely on UCB1, KL-UCB, MOSS, IMED, and Thompson sampling. We demonstrate the performance of different BEE algorithms empirically and compare them to their standard, or supervised, counterparts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes