DC LG PF MLMar 18, 2020

ContainerStress: Autonomous Cloud-Node Scoping Framework for Big-Data ML Use Cases

Guang Chao Wang, Kenny Gross, Akshay Subramaniam

arXiv:2003.08011v11.21 citations

Originality Synthesis-oriented

AI Analysis

This addresses a specific problem for cloud vendors and customers in optimizing resource allocation for ML deployments, but it appears incremental as it builds on existing simulation and benchmarking techniques.

The authors tackled the challenge of configuring cloud containers for big-data ML services by developing an automated framework that uses nested-loop Monte Carlo simulation to autonomously scale customer use cases across CPU-GPU configurations, resulting in a benchmark study that analyzes compute cost and GPU acceleration to assess cost reductions.

Deploying big-data Machine Learning (ML) services in a cloud environment presents a challenge to the cloud vendor with respect to the cloud container configuration sizing for any given customer use case. OracleLabs has developed an automated framework that uses nested-loop Monte Carlo simulation to autonomously scale any size customer ML use cases across the range of cloud CPU-GPU "Shapes" (configurations of CPUs and/or GPUs in Cloud containers available to end customers). Moreover, the OracleLabs and NVIDIA authors have collaborated on a ML benchmark study which analyzes the compute cost and GPU acceleration of any ML prognostic algorithm and assesses the reduction of compute cost in a cloud container comprising conventional CPUs and NVIDIA GPUs.

View on arXiv PDF

Similar