Meta Mirror Descent: Optimiser Learning for Fast Convergence
This addresses the need for faster and more theoretically grounded optimisers in machine learning, offering convergence and generalisation guarantees without validation data, though it is incremental as it builds on mirror descent and meta-learning paradigms.
The paper tackled the problem of designing more effective gradient-descent optimisers by proposing Meta Mirror Descent (MetaMD), which meta-learns a Bregman divergence from mirror descent to accelerate optimisation speed, and demonstrated strong performance on various tasks and architectures.
Optimisers are an essential component for training machine learning models, and their design influences learning speed and generalisation. Several studies have attempted to learn more effective gradient-descent optimisers via solving a bi-level optimisation problem where generalisation error is minimised with respect to optimiser parameters. However, most existing optimiser learning methods are intuitively motivated, without clear theoretical support. We take a different perspective starting from mirror descent rather than gradient descent, and meta-learning the corresponding Bregman divergence. Within this paradigm, we formalise a novel meta-learning objective of minimising the regret bound of learning. The resulting framework, termed Meta Mirror Descent (MetaMD), learns to accelerate optimisation speed. Unlike many meta-learned optimisers, it also supports convergence and generalisation guarantees and uniquely does so without requiring validation data. We evaluate our framework on a variety of tasks and architectures in terms of convergence rate and generalisation error and demonstrate strong performance.