PFLGMay 9

Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

arXiv:2605.0873111.2
Predicted impact top 61% in PF · last 90 daysOriginality Synthesis-oriented
AI Analysis

For ML practitioners selecting JPEG decoders, this work demonstrates that standard microbenchmarks are misleading and provides a more realistic evaluation methodology.

The paper shows that single-thread JPEG decoder benchmarks misrepresent real-world ML data loader performance, as decoder rankings change significantly when evaluated in multi-worker PyTorch DataLoader contexts across five CPU architectures. For example, imageio ranks 9th in single-thread but ties for top in DataLoader on Neoverse V2, while torchvision rises from 7th to top on Zen 4.

JPEG decode is routine ML infrastructure, but Python decoder choices are often justified by single-process, single-thread microbenchmarks. We audit this evaluation assumption with twelve Python-accessible JPEG decode paths on five matched 16 vCPU Google Cloud CPUs: Intel Emerald Rapids, AMD Zen 4, AMD Zen 5, ARM Neoverse V2, and ARM Neoverse N1. ImageNet validation is the workload, not a new dataset contribution: each run decodes the full 50,000-image split from memory and reports single-thread throughput for all decoders, PyTorch DataLoader throughput for eligible decoders at worker counts {0,2,4,8}, and decoder skip behavior. The evaluation protocol changes the supported conclusion. On Neoverse V2, imageio is ninth in single-thread throughput yet lands in the top DataLoader tier with torchvision; on Zen 4, torchvision rises from seventh single-thread to the top measured DataLoader tier; on Neoverse N1, imagecodecs is the single-thread leader but fourth at peak DataLoader throughput. We also find that worker-count conclusions differ between Zen 4 and Zen 5, TensorFlow has a large single-thread ARM penalty, and strict libjpeg-turbo-family wrappers reject the same rare ImageNet JPEG. For PyTorch DataLoader workloads, torchvision and simplejpeg form the strongest measured zero-skip tier: torchvision has the highest mean normalized throughput, while simplejpeg has the highest minimum. OpenCV remains a robust general-purpose fallback above 90% of the platform-local winner on every tested CPU. We release raw JSON, generated tables/figures, and an executable local/cloud benchmark framework.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes