ITLGMLJan 29, 2015

Sequential Probability Assignment with Binary Alphabets and Large Classes of Experts

arXiv:1501.07340v132 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of sequential prediction with large expert classes in machine learning, offering theoretical insights and algorithmic improvements for scenarios like high-dimensional parameter spaces, though it is incremental in building on existing complexity frameworks.

The paper tackles the problem of sequential probability assignment for binary outcomes with side information and logarithmic loss, providing upper and lower bounds for minimax regret in terms of sequential complexities of the expert class, with a new analysis using sequential chaining and Bernstein-type bounds to handle unbounded gradients. It also presents an algorithm based on regularization with a self-concordant barrier for a high-dimensional example, achieving non-trivial bounds where typical discretization fails.

We analyze the problem of sequential probability assignment for binary outcomes with side information and logarithmic loss, where regret---or, redundancy---is measured with respect to a (possibly infinite) class of experts. We provide upper and lower bounds for minimax regret in terms of sequential complexities of the class. These complexities were recently shown to give matching (up to logarithmic factors) upper and lower bounds for sequential prediction with general convex Lipschitz loss functions (Rakhlin and Sridharan, 2015). To deal with unbounded gradients of the logarithmic loss, we present a new analysis that employs a sequential chaining technique with a Bernstein-type bound. The introduced complexities are intrinsic to the problem of sequential probability assignment, as illustrated by our lower bound. We also consider an example of a large class of experts parametrized by vectors in a high-dimensional Euclidean ball (or a Hilbert ball). The typical discretization approach fails, while our techniques give a non-trivial bound. For this problem we also present an algorithm based on regularization with a self-concordant barrier. This algorithm is of an independent interest, as it requires a bound on the function values rather than gradients.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes