CLCVLGOct 26, 2023

torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP

arXiv:2310.17644v1133 citationsh-index: 13
Originality Synthesis-oriented
AI Analysis

This work addresses reproducibility challenges for researchers in NLP and computer vision by providing an upgraded, modular framework, though it is incremental as it builds on existing tools.

The authors upgraded torchdistill, a coding-free deep learning framework, to support more tasks by integrating with Hugging Face libraries, and demonstrated this by reproducing GLUE benchmark results for BERT models, with all 27 models and configurations published and widely used.

Reproducibility in scientific work has been becoming increasingly important in research communities such as machine learning, natural language processing, and computer vision communities due to the rapid development of the research domains supported by recent advances in deep learning. In this work, we present a significantly upgraded version of torchdistill, a modular-driven coding-free deep learning framework significantly upgraded from the initial release, which supports only image classification and object detection tasks for reproducible knowledge distillation experiments. To demonstrate that the upgraded framework can support more tasks with third-party libraries, we reproduce the GLUE benchmark results of BERT models using a script based on the upgraded torchdistill, harmonizing with various Hugging Face libraries. All the 27 fine-tuned BERT models and configurations to reproduce the results are published at Hugging Face, and the model weights have already been widely used in research communities. We also reimplement popular small-sized models and new knowledge distillation methods and perform additional experiments for computer vision tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes