DCLGMLJul 14, 2020

Serverless inferencing on Kubernetes

arXiv:2007.07366v210 citations
AI Analysis

This work addresses infrastructure cost and deployment complexity for organizations scaling ML models in production, though it is incremental as it builds on existing serverless paradigms like KNative.

The paper tackles the challenge of deploying machine learning models efficiently at scale by introducing KFServing, a serverless inference solution on Kubernetes that reduces infrastructure costs through scale-to-zero capabilities, specifically addressing autoscaling for GPU-based inference and providing a consistent interface for data scientists.

Organisations are increasingly putting machine learning models into production at scale. The increasing popularity of serverless scale-to-zero paradigms presents an opportunity for deploying machine learning models to help mitigate infrastructure costs when many models may not be in continuous use. We will discuss the KFServing project which builds on the KNative serverless paradigm to provide a serverless machine learning inference solution that allows a consistent and simple interface for data scientists to deploy their models. We will show how it solves the challenges of autoscaling GPU based inference and discuss some of the lessons learnt from using it in production.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes