SEAIPFNov 10, 2025

Energy Consumption of Dataframe Libraries for End-to-End Deep Learning Pipelines:A Comparative Analysis

arXiv:2511.08644v2h-index: 9
Originality Synthesis-oriented
AI Analysis

This addresses the energy efficiency of data processing for deep learning practitioners, but it is incremental as it extends existing comparative analyses to include energy metrics.

This paper tackled the problem of comparing the performance of Python data manipulation libraries (Pandas, Polars, Dask) in deep learning pipelines, finding results on runtime, memory usage, disk usage, and energy consumption across various models and datasets.

This paper presents a detailed comparative analysis of the performance of three major Python data manipulation libraries - Pandas, Polars, and Dask - specifically when embedded within complete deep learning (DL) training and inference pipelines. The research bridges a gap in existing literature by studying how these libraries interact with substantial GPU workloads during critical phases like data loading, preprocessing, and batch feeding. The authors measured key performance indicators including runtime, memory usage, disk usage, and energy consumption (both CPU and GPU) across various machine learning models and datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes