DCLGJan 30, 2025

Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions

arXiv:2502.12017v1h-index: 6ICSOC Workshops
Originality Incremental advance
AI Analysis

This addresses scalability and cost-efficiency problems for data-intensive applications in limited-resource environments, representing an incremental improvement by applying serverless computing to ML inference.

The paper tackled scalability and cost challenges in ML inference by using serverless functions for parallel batch processing, demonstrating a reduction in execution time by over 95% compared to monolithic approaches at the same cost in a sentiment analysis case study.

As data-intensive applications grow, batch processing in limited-resource environments faces scalability and resource management challenges. Serverless computing offers a flexible alternative, enabling dynamic resource allocation and automatic scaling. This paper explores how serverless architectures can make large-scale ML inference tasks faster and cost-effective by decomposing monolithic processes into parallel functions. Through a case study on sentiment analysis using the DistilBERT model and the IMDb dataset, we demonstrate that serverless parallel processing can reduce execution time by over 95% compared to monolithic approaches, at the same cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes