Fast Rates by Transferring from Auxiliary Hypotheses
This work addresses the challenge of accelerating learning rates in machine learning for practitioners using transfer learning, though it appears incremental as it builds on existing ERM-based linear algorithms.
The paper tackles the problem of improving learning rates in transfer learning by leveraging auxiliary hypotheses from other tasks, showing that with a good combination of source hypotheses, generalization can achieve a fast rate of O(1/m) instead of the usual O(1/√m), while reverting to the standard rate if the combination is unsuitable.
In this work we consider the learning setting where, in addition to the training set, the learner receives a collection of auxiliary hypotheses originating from other tasks. We focus on a broad class of ERM-based linear algorithms that can be instantiated with any non-negative smooth loss function and any strongly convex regularizer. We establish generalization and excess risk bounds, showing that, if the algorithm is fed with a good combination of source hypotheses, generalization happens at the fast rate $\mathcal{O}(1/m)$ instead of the usual $\mathcal{O}(1/\sqrt{m})$. On the other hand, if the source hypotheses combination is a misfit for the target task, we recover the usual learning rate. As a byproduct of our study, we also prove a new bound on the Rademacher complexity of the smooth loss class under weaker assumptions compared to previous works.