LG MLJul 1, 2016

A scaled Bregman theorem with applications

Richard Nock, Aditya Krishna Menon, Cheng Soon Ong

arXiv:1607.00360v16.815 citations

Originality Incremental advance

AI Analysis

This provides a theoretical tool for simplifying algorithm analysis in machine learning, though it appears incremental as it builds on existing Bregman divergence properties.

The paper tackles the challenge of analyzing algorithms that rely on Bregman distortions by introducing a scaled Bregman theorem, which allows these distortions to be rewritten as scaled Bregman divergences over transformed data, enabling reductions in multi-class density ratio estimation, adaptive projection-free mirror descent, and clustering across manifolds.

Bregman divergences play a central role in the design and analysis of a range of machine learning algorithms. This paper explores the use of Bregman divergences to establish reductions between such algorithms and their analyses. We present a new scaled isodistortion theorem involving Bregman divergences (scaled Bregman theorem for short) which shows that certain "Bregman distortions'" (employing a potentially non-convex generator) may be exactly re-written as a scaled Bregman divergence computed over transformed data. Admissible distortions include geodesic distances on curved manifolds and projections or gauge-normalisation, while admissible data include scalars, vectors and matrices. Our theorem allows one to leverage to the wealth and convenience of Bregman divergences when analysing algorithms relying on the aforementioned Bregman distortions. We illustrate this with three novel applications of our theorem: a reduction from multi-class density ratio to class-probability estimation, a new adaptive projection free yet norm-enforcing dual norm mirror descent algorithm, and a reduction from clustering on flat manifolds to clustering on curved manifolds. Experiments on each of these domains validate the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence properties that have been pervasive in machine learning.

View on arXiv PDF

Similar