DCLGNov 22, 2023

A Survey of Serverless Machine Learning Model Inference

arXiv:2311.13587v19 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

It provides a taxonomy and trends for researchers and practitioners in AI deployment, but is incremental as a survey.

This survey addresses the challenge of deploying large AI models in production with serverless architectures while meeting Service Level Objectives, by summarizing emerging challenges and optimization opportunities for deep learning serving systems.

Recent developments in Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products. This widespread adoption of AI requires significant efforts in deploying these models in production environments. When hosting machine learning models for real-time predictions, it is important to meet defined Service Level Objectives (SLOs), ensuring reliability, minimal downtime, and optimizing operational costs of the underlying infrastructure. Large machine learning models often demand GPU resources for efficient inference to meet SLOs. In the context of these trends, there is growing interest in hosting AI models in a serverless architecture while still providing GPU access for inference tasks. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems. By providing a novel taxonomy and summarizing recent trends, we hope that this survey could shed light on new optimization perspectives and motivate novel works in large-scale deep learning serving systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes