LGJun 16, 2024

Enriching the Machine Learning Workloads in BigBench

arXiv:2406.10843v1
Originality Synthesis-oriented
AI Analysis

This work provides a standardized benchmark for evaluating machine learning technologies in big data systems, which is incremental as it builds upon existing BigBench V2.

The authors extended the BigBench V2 benchmark by adding three new workloads that incorporate multiple machine learning algorithms and compare implementations across popular libraries like MLlib, SystemML, Scikit-learn, and Pandas, demonstrating the benchmark's relevance and usability.

In the era of Big Data and the growing support for Machine Learning, Deep Learning and Artificial Intelligence algorithms in the current software systems, there is an urgent need of standardized application benchmarks that stress test and evaluate these new technologies. Relying on the standardized BigBench (TPCx-BB) benchmark, this work enriches the improved BigBench V2 with three new workloads and expands the coverage of machine learning algorithms. Our workloads utilize multiple algorithms and compare different implementations for the same algorithm across several popular libraries like MLlib, SystemML, Scikit-learn and Pandas, demonstrating the relevance and usability of our benchmark extension.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes