Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT
This work addresses hyperparameter tuning for fine-tuned models, relevant for NLP and computer vision practitioners, but it is incremental as it builds on existing methods without introducing new paradigms.
This study tackled the problem of optimizing fine-tuning strategies for text classification using DistilBERT by evaluating hyperparameters like learning rate, batch size, and epochs, finding that their interactions significantly affect accuracy, F1-score, and loss, with specific p-values indicating trade-offs such as higher learning rate reducing loss (p=0.027) but challenging accuracy.
This study evaluates fine-tuning strategies for text classification using the DistilBERT model, specifically the distilbert-base-uncased-finetuned-sst-2-english variant. Through structured experiments, we examine the influence of hyperparameters such as learning rate, batch size, and epochs on accuracy, F1-score, and loss. Polynomial regression analyses capture foundational and incremental impacts of these hyperparameters, focusing on fine-tuning adjustments relative to a baseline model. Results reveal variability in metrics due to hyperparameter configurations, showing trade-offs among performance metrics. For example, a higher learning rate reduces loss in relative analysis (p=0.027) but challenges accuracy improvements. Meanwhile, batch size significantly impacts accuracy and F1-score in absolute regression (p=0.028 and p=0.005) but has limited influence on loss optimization (p=0.170). The interaction between epochs and batch size maximizes F1-score (p=0.001), underscoring the importance of hyperparameter interplay. These findings highlight the need for fine-tuning strategies addressing non-linear hyperparameter interactions to balance performance across metrics. Such variability and metric trade-offs are relevant for tasks beyond text classification, including NLP and computer vision. This analysis informs fine-tuning strategies for large language models and promotes adaptive designs for broader model applicability.