LGOCMLOct 19, 2020

How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers

arXiv:2010.09889v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of fairly comparing optimizers for researchers and practitioners, though it is incremental as it builds on existing benchmarking methods.

The authors tackled the problem of benchmarking neural network optimizers by proposing a new evaluation protocol that assesses end-to-end efficiency with bandit hyperparameter tuning and data-addition training efficiency for data shifts, applied across multiple tasks. Their results showed no clear winner among 7 optimizers across computer vision, NLP, reinforcement learning, and graph mining tasks.

Many optimizers have been proposed for training deep neural networks, and they often have multiple hyperparameters, which make it tricky to benchmark their performance. In this work, we propose a new benchmarking protocol to evaluate both end-to-end efficiency (training a model from scratch without knowing the best hyperparameter) and data-addition training efficiency (the previously selected hyperparameters are used for periodically re-training the model with newly collected data). For end-to-end efficiency, unlike previous work that assumes random hyperparameter tuning, which over-emphasizes the tuning time, we propose to evaluate with a bandit hyperparameter tuning strategy. A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search. For data-addition training, we propose a new protocol for assessing the hyperparameter sensitivity to data shift. We then apply the proposed benchmarking framework to 7 optimizers and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining. Our results show that there is no clear winner across all the tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes