DCCVLGPFSep 27, 2022

An Overview of the Data-Loader Landscape: Comparative Performance Analysis

arXiv:2209.13705v17 citationsh-index: 14
Originality Synthesis-oriented
AI Analysis

This work addresses the need for optimized data-loading in machine learning training, but it is incremental as it focuses on comparative analysis rather than introducing new methods.

The paper tackles the problem of dataloaders as a bottleneck in deep learning workflows by distinguishing them as a separate component and providing a comprehensive comparative analysis of available libraries, resulting in insights into their trade-offs in functionality, usability, and performance.

Dataloaders, in charge of moving data from storage into GPUs while training machine learning models, might hold the key to drastically improving the performance of training jobs. Recent advances have shown promise not only by considerably decreasing training time but also by offering new features such as loading data from remote storage like S3. In this paper, we are the first to distinguish the dataloader as a separate component in the Deep Learning (DL) workflow and to outline its structure and features. Finally, we offer a comprehensive comparison of the different dataloading libraries available, their trade-offs in terms of functionality, usability, and performance and the insights derived from them.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes