DCLGJan 17, 2024

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

arXiv:2401.12230v114 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This addresses cost and scalability issues for providers and users of large AI models, though it is incremental as it builds on existing cloud and AI concepts.

The paper tackles the challenge of high costs and resource demands for large generative AI models like ChatGPT by proposing an AI-native computing paradigm that integrates cloud-native technologies with machine learning runtime optimizations, aiming to reduce costs-of-goods-sold and improve accessibility.

In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computing paradigm that harnesses the power of both cloud-native technologies (e.g., multi-tenancy and serverless computing) and advanced machine learning runtime (e.g., batched LoRA inference). These joint efforts aim to optimize costs-of-goods-sold (COGS) and improve resource accessibility. The journey of merging these two domains is just at the beginning and we hope to stimulate future research and development in this area.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes