CL LGMay 2, 2024

Early Transformers: A study on Efficient Training of Transformer Models through Early-Bird Lottery Tickets

arXiv:2405.02353v1h-index: 1

Originality Incremental advance

AI Analysis

This work addresses the problem of high computational costs for practitioners training Transformer models in NLP and computer vision, though it is incremental as it extends an existing hypothesis to new architectures.

This paper tackles the resource-intensive training of Transformer models by applying the early-bird ticket hypothesis, showing that pruned models from early-bird tickets achieve comparable or superior accuracy while reducing memory usage.

The training of Transformer models has revolutionized natural language processing and computer vision, but it remains a resource-intensive and time-consuming process. This paper investigates the applicability of the early-bird ticket hypothesis to optimize the training efficiency of Transformer models. We propose a methodology that combines iterative pruning, masked distance calculation, and selective retraining to identify early-bird tickets in various Transformer architectures, including ViT, Swin-T, GPT-2, and RoBERTa. Our experimental results demonstrate that early-bird tickets can be consistently found within the first few epochs of training or fine-tuning, enabling significant resource optimization without compromising performance. The pruned models obtained from early-bird tickets achieve comparable or even superior accuracy to their unpruned counterparts while substantially reducing memory usage. Furthermore, our comparative analysis highlights the generalizability of the early-bird ticket phenomenon across different Transformer models and tasks. This research contributes to the development of efficient training strategies for Transformer models, making them more accessible and resource-friendly. By leveraging early-bird tickets, practitioners can accelerate the progress of natural language processing and computer vision applications while reducing the computational burden associated with training Transformer models.

View on arXiv PDF

Similar