Prediction, Learning, Uniform Convergence, and Scale-sensitive Dimensions
This work addresses foundational theoretical challenges in machine learning by providing new insights into learnability and uniform convergence, with implications for algorithm design and analysis.
The paper tackles the problem of learning classes of [0,1]-valued functions in a prediction model, presenting a new algorithm with upper bounds on expected absolute error in terms of scale-sensitive dimensions, and applies this to derive improved bounds on sample complexity for agnostic learning.
We present a new general-purpose algorithm for learning classes of $[0,1]$-valued functions in a generalization of the prediction model, and prove a general upper bound on the expected absolute error of this algorithm in terms of a scale-sensitive generalization of the Vapnik dimension proposed by Alon, Ben-David, Cesa-Bianchi and Haussler. We give lower bounds implying that our upper bounds cannot be improved by more than a constant factor in general. We apply this result, together with techniques due to Haussler and to Benedek and Itai, to obtain new upper bounds on packing numbers in terms of this scale-sensitive notion of dimension. Using a different technique, we obtain new bounds on packing numbers in terms of Kearns and Schapire's fat-shattering function. We show how to apply both packing bounds to obtain improved general bounds on the sample complexity of agnostic learning. For each $ε> 0$, we establish weaker sufficient and stronger necessary conditions for a class of $[0,1]$-valued functions to be agnostically learnable to within $ε$, and to be an $ε$-uniform Glivenko-Cantelli class. This is a manuscript that was accepted by JCSS, together with a correction.