On aggregation for heavy-tailed classes
This work addresses a theoretical limitation in machine learning for non-convex classes, offering a solution with broad implications for learning under heavy-tailed distributions.
The paper tackles the problem of achieving optimal error rates in learning theory for heavy-tailed classes by introducing an aggregation procedure that attains these rates under minimal assumptions, such as equivalence of L_q and L_2 norms for q>2 and square-integrable targets.
We introduce an alternative to the notion of `fast rate' in Learning Theory, which coincides with the optimal error rate when the given class happens to be convex and regular in some sense. While it is well known that such a rate cannot always be attained by a learning procedure (i.e., a procedure that selects a function in the given class), we introduce an aggregation procedure that attains that rate under rather minimal assumptions -- for example, that the $L_q$ and $L_2$ norms are equivalent on the linear span of the class for some $q>2$, and the target random variable is square-integrable.