Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
This work addresses the resource-intensive nature of pretraining for broader accessibility, though it is incremental in applying existing methods to a new task.
The authors tackled the problem of democratizing pretraining by training a single neural network to predict high-quality ImageNet parameters for other models, which boosted training of diverse ImageNet models and improved convergence and performance on other datasets.
Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.