Valerio Besozzi

h-index2
2papers

2 Papers

DCFeb 6Code
Reinforcement Learning-Based Dynamic Management of Structured Parallel Farm Skeletons on Serverless Platforms

Lanpei Li, Massimo Coppola, Malio Li et al.

We present a framework for dynamic management of structured parallel processing skeletons on serverless platforms. Our goal is to bring HPC-like performance and resilience to serverless and continuum environments while preserving the programmability benefits of skeletons. As a first step, we focus on the well known Farm pattern and its implementation on the open-source OpenFaaS platform, treating autoscaling of the worker pool as a QoS-aware resource management problem. The framework couples a reusable farm template with a Gymnasium-based monitoring and control layer that exposes queue, timing, and QoS metrics to both reactive and learning-based controllers. We investigate the effectiveness of AI-driven dynamic scaling for managing the farm's degree of parallelism via the scalability of serverless functions on OpenFaaS. In particular, we discuss the autoscaling model and its training, and evaluate two reinforcement learning (RL) policies against a baseline of reactive management derived from a simple farm performance model. Our results show that AI-based management can better accommodate platform-specific limitations than purely model-based performance steering, improving QoS while maintaining efficient resource usage and stable scaling behaviour.

DCJan 14
High-Performance Serverless Computing: A Systematic Literature Review on Serverless for HPC, AI, and Big Data

Valerio Besozzi, Matteo Della Bartola, Patrizio Dazzi et al.

The widespread deployment of large-scale, compute-intensive applications such as high-performance computing, artificial intelligence, and big data is leading to convergence between cloud and high-performance computing infrastructures. Cloud providers are increasingly integrating high-performance computing capabilities in their infrastructures, such as hardware accelerators and high-speed interconnects, while researchers in the high-performance computing community are starting to explore cloud-native paradigms to improve scalability, elasticity, and resource utilization. In this context, serverless computing emerges as a promising execution model to efficiently handle highly dynamic, parallel, and distributed workloads. This paper presents a comprehensive systematic literature review of 122 research articles published between 2018 and early 2025, exploring the use of the serverless paradigm to develop, deploy, and orchestrate compute-intensive applications across cloud, high-performance computing, and hybrid environments. From these, a taxonomy comprising eight primary research directions and nine targeted use case domains is proposed, alongside an analysis of recent publication trends and collaboration networks among authors, highlighting the growing interest and interconnections within this emerging research field. Overall, this work aims to offer a valuable foundation for both new researchers and experienced practitioners, guiding the development of next-generation serverless solutions for parallel compute-intensive applications.