Benjamin ThÃ©rien

75.7LGMar 19

$Î¼$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

Benjamin Thérien, Charles-Étienne Joseph, Boris Knyazev et al.

Learned optimizers (LOs) have the potential to significantly reduce the wall-clock training time of neural networks. However, they can struggle to optimize unseen tasks (meta-generalize), especially when training networks wider than those seen during meta-training. To address this, we derive the Maximal Update Parametrization ($Î¼$P) for two state-of-the-art learned optimizer architectures and propose a simple meta-training recipe for $Î¼$-parameterized LOs ($Î¼$LOs). Our empirical evaluation demonstrates that LOs meta-trained with our recipe substantially improve meta-generalization to wider unseen tasks when compared to LOs trained under standard parametrization (SP) using the same compute budget. We also empirically observe that $Î¼$LOs exhibit unexpectedly improved meta-generalization to deeper networks ($5\times$ meta-training) and surprising generalization to much longer training horizons ($25\times$ meta-training) when compared to SP LOs.

Benjamin ThÃ©rien

1 Paper