nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources
This addresses the problem of limited accessibility to language modeling research for researchers with constrained computational resources, though it is incremental as it optimizes existing methods rather than introducing a new paradigm.
The paper tackles the high computational demands of T5 models by introducing nanoT5, a PyTorch framework that enables efficient pre-training and fine-tuning, achieving pre-training of a T5-Base model on a single GPU in 16 hours without performance loss.
State-of-the-art language models like T5 have revolutionized the NLP landscape, but their computational demands hinder a large portion of the research community. To address this challenge, we present nanoT5, a specially-optimized PyTorch framework for efficient pre-training and fine-tuning of T5 models. Drawing on insights from optimizer differences and prioritizing efficiency, nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance. With the introduction of this open-source framework, we hope to widen the accessibility to language modelling research and cater to the community's demand for more user-friendly T5 (Encoder-Decoder) implementations. We make our contributions, including configurations, codebase, pre-training insights, and pre-trained models, available to the public.