LGJun 6, 2022

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

Stanford
arXiv:2206.02326v211 citationsh-index: 72
Originality Highly original
AI Analysis

This provides a foundational advancement for researchers in machine learning and AI by offering a general framework for instance-optimal algorithms in interactive decision making, though it is incremental in building on existing theory.

The paper tackles the problem of designing algorithms for interactive decision making that adapt to the complexity of specific instances rather than worst-case scenarios, resulting in an asymptotic instance-optimal algorithm with regret characterized by a complexity measure C(f) ln n, which recovers and improves upon prior bounds in bandits and reinforcement learning.

Past research on interactive decision making problems (bandits, reinforcement learning, etc.) mostly focuses on the minimax regret that measures the algorithm's performance on the hardest instance. However, an ideal algorithm should adapt to the complexity of a particular problem instance and incur smaller regrets on easy instances than worst-case instances. In this paper, we design the first asymptotic instance-optimal algorithm for general interactive decision making problems with finite number of decisions under mild conditions. On every instance $f$, our algorithm outperforms all consistent algorithms (those achieving non-trivial regrets on all instances), and has asymptotic regret $\mathcal{C}(f) \ln n$, where $\mathcal{C}(f)$ is an exact characterization of the complexity of $f$. The key step of the algorithm involves hypothesis testing with active data collection. It computes the most economical decisions with which the algorithm collects observations to test whether an estimated instance is indeed correct; thus, the complexity $\mathcal{C}(f)$ is the minimum cost to test the instance $f$ against other instances. Our results, instantiated on concrete problems, recover the classical gap-dependent bounds for multi-armed bandits [Lai and Robbins, 1985] and prior works on linear bandits [Lattimore and Szepesvari, 2017], and improve upon the previous best instance-dependent upper bound [Xu et al., 2021] for reinforcement learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes