Enriching the Machine Learning Workloads in BigBench
This work provides a standardized benchmark for evaluating machine learning technologies in big data systems, which is incremental as it builds upon existing BigBench V2.
The authors extended the BigBench V2 benchmark by adding three new workloads that incorporate multiple machine learning algorithms and compare implementations across popular libraries like MLlib, SystemML, Scikit-learn, and Pandas, demonstrating the benchmark's relevance and usability.
In the era of Big Data and the growing support for Machine Learning, Deep Learning and Artificial Intelligence algorithms in the current software systems, there is an urgent need of standardized application benchmarks that stress test and evaluate these new technologies. Relying on the standardized BigBench (TPCx-BB) benchmark, this work enriches the improved BigBench V2 with three new workloads and expands the coverage of machine learning algorithms. Our workloads utilize multiple algorithms and compare different implementations for the same algorithm across several popular libraries like MLlib, SystemML, Scikit-learn and Pandas, demonstrating the relevance and usability of our benchmark extension.