RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
This addresses cost and performance challenges for businesses and scientific processes using deep learning inference services, representing an incremental improvement over homogeneous instance approaches.
The paper tackles the problem of balancing quality-of-service (QoS) and cost-effectiveness in deep learning model inference by introducing RIBBON, a system that uses a diverse pool of cloud computing instances, achieving up to 16% cost savings for models like recommender systems and drug-discovery models.
Deep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces RIBBON, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind RIBBON is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. RIBBON devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms -- and, RIBBON demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. RIBBON saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender system models and drug-discovery enabling models.