DailyMAE: Towards Pretraining Masked Autoencoders in One Day
This work addresses a bottleneck in self-supervised learning research by making pretraining more accessible, particularly for academic environments, though it is incremental as it focuses on efficiency improvements rather than new methods.
The paper tackles the high computational demands of masked image modeling pretraining by proposing efficient training recipes that mitigate data loading bottlenecks and use progressive techniques, enabling training of a MAE-Base/16 model on ImageNet 1K for 800 epochs in 18 hours on 8 A100 GPUs with up to 5.8x speed gains.
Recently, masked image modeling (MIM), an important self-supervised learning (SSL) method, has drawn attention for its effectiveness in learning data representation from unlabeled data. Numerous studies underscore the advantages of MIM, highlighting how models pretrained on extensive datasets can enhance the performance of downstream tasks. However, the high computational demands of pretraining pose significant challenges, particularly within academic environments, thereby impeding the SSL research progress. In this study, we propose efficient training recipes for MIM based SSL that focuses on mitigating data loading bottlenecks and employing progressive training techniques and other tricks to closely maintain pretraining performance. Our library enables the training of a MAE-Base/16 model on the ImageNet 1K dataset for 800 epochs within just 18 hours, using a single machine equipped with 8 A100 GPUs. By achieving speed gains of up to 5.8 times, this work not only demonstrates the feasibility of conducting high-efficiency SSL training but also paves the way for broader accessibility and promotes advancement in SSL research particularly for prototyping and initial testing of SSL ideas. The code is available in https://github.com/erow/FastSSL.