LG SY OC MLApr 13, 2020

A Theory of the Risk for Optimization with Relaxation and its Application to Support Vector Machines

arXiv:2004.05839v45.824 citations

Originality Incremental advance

AI Analysis

This work provides theoretical guarantees for generalization in support vector machines, addressing a foundational problem in machine learning, though it builds incrementally on prior research.

The authors extended a prior theory linking risk and complexity in optimization with relaxation to cover more machine learning algorithms, and applied it to support vector methods to derive new generalization results valid for any finite data size, showing that risk approaches the ratio of complexity to sample size as data tends to infinity.

In this paper we consider optimization with relaxation, an ample paradigm to make data-driven designs. This approach was previously considered by the same authors of this work in Garatti and Campi (2019), a study that revealed a deep-seated connection between two concepts: risk (probability of not satisfying a new, out-of-sample, constraint) and complexity (according to a definition introduced in paper Garatti and Campi (2019)). This connection was shown to have profound implications in applications because it implied that the risk can be estimated from the complexity, a quantity that can be measured from the data without any knowledge of the data-generation mechanism. In the present work we establish new results. First, we expand the scope of Garatti and Campi (2019) so as to embrace a more general setup that covers various algorithms in machine learning. Then, we study classical support vector methods - including SVM (Support Vector Machine), SVR (Support Vector Regression) and SVDD (Support Vector Data Description) - and derive new results for the ability of these methods to generalize. All results are valid for any finite size of the data set. When the sample size tends to infinity, we establish the unprecedented result that the risk approaches the ratio between the complexity and the cardinality of the data sample, regardless of the value of the complexity.

View on arXiv PDF

Similar