Rohan Basu Roy

DCJul 23, 2022

RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances

Baolin Li, Rohan Basu Roy, Tirthak Patel et al.

Deep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces RIBBON, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind RIBBON is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. RIBBON devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms -- and, RIBBON demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. RIBBON saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender system models and drug-discovery enabling models.

62.7QUANT-PHMay 12

TuniQ: Autotuning Compilation Passes for Quantum Workloads at Scale for Effectiveness and Efficiency

Mohammad Abrarul Hasanat, Jason Ludmir, Tirthak Patel et al.

Quantum processors are being integrated into HPC ecosystems as co-processors, where compilation of quantum circuits into hardware-executable form determines both output fidelity and runtime. Current compilers use a fixed pass sequence and ignore the fact that optimal pass selection varies with circuit, hardware, and noise conditions. We present TuniQ, a reinforcement learning-based system that selects compilation passes at each pipeline stage, adapting to circuit, backend, and current noise profile. TuniQ introduces several novel design components like a dual-encoder for stage-aware representation, shaped rewards for cross-stage credit assignment, and dynamic action masking for valid compilation. Evaluated across diverse quantum workloads on multiple IBM Quantum Cloud processors, TuniQ improves fidelity and reduces compilation time over the state-of-the-art IBM Qiskit transpiler, generalizes across backends without retraining, and scales strongly to utility-scale circuits with growing advantage.

Rohan Basu Roy

2 Papers