Optimization with Access to Auxiliary Information
This work addresses optimization efficiency in practical settings like federated learning and transfer learning, but it is incremental as it builds on existing frameworks with new algorithms.
The paper tackles the problem of minimizing a target function with expensive gradients by leveraging cheap auxiliary functions, showing that benefits arise under Hessian similarity assumptions and correlated stochastic noise.
We investigate the fundamental optimization question of minimizing a target function $f$, whose gradients are expensive to compute or have limited availability, given access to some auxiliary side function $h$ whose gradients are cheap or more available. This formulation captures many settings of practical relevance, such as i) re-using batches in SGD, ii) transfer learning, iii) federated learning, iv) training with compressed models/dropout, Et cetera. We propose two generic new algorithms that apply in all these settings; we also prove that we can benefit from this framework under the Hessian similarity assumption between the target and side information. A benefit is obtained when this similarity measure is small; we also show a potential benefit from stochasticity when the auxiliary noise is correlated with that of the target function.