Pietro Spadaccino

24.1AIMay 19

Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption

Mert Yildiz, Pietro Spadaccino, Alexey Rolich et al.

Modern deployments of Large Language Models (LLMs) increasingly require serving multiple models with diverse architectures, sizes, and specialization on shared, heterogeneous hardware. This setting introduces new challenges for resource allocation, dispatching, and scheduling, particularly under GPU memory constraints where partial CPU-GPU offloading and preemption become necessary. While existing systems primarily optimize throughput for a single model, comparatively little work addresses multi-model scheduling under these conditions. In this paper, we present an empirical study of how different LLMs behave across hardware platforms, focusing on the performance implications of layer offloading and preemption. We show that offloading leads to strongly non-linear and model-dependent degradation in decode throughput, with smaller models exhibiting sharper sensitivity to reduced GPU residency. We further demonstrate that preemption incurs substantial overhead, largely dominated by model state reload rather than key-value cache transfer, and that this cost varies significantly across models and hardware platforms. Additionally, we highlight the role of sequence length and interconnect bandwidth in amplifying data movement and execution inefficiencies. Based on these findings, we identify a set of key features that future schedulers must consider, including model-specific offloading sensitivity, workload characteristics, and the cost structure of preemption and data transfer. These insights provide guidance for the design of next-generation LLM serving systems capable of efficiently managing heterogeneous, multi-model workloads with hybrid CPU-GPU execution.

CRDec 2, 2020

Intrusion Detection Systems for IoT: opportunities and challenges offered by Edge Computing and Machine Learning

Pietro Spadaccino, Francesca Cuomo

Key components of current cybersecurity methods are the Intrusion Detection Systems (IDSs) were different techniques and architectures are applied to detect intrusions. IDSs can be based either on cross-checking monitored events with a database of known intrusion experiences, known as signature-based, or on learning the normal behavior of the system and reporting whether some anomalous events occur, named anomaly-based. This work is dedicated to survey the application of IDS to the Internet of Things (IoT) networks, where also the edge computing is used to support the IDS implementation. New challenges that arise when deploying an IDS in an edge scenario are identified and remedies are proposed. We focus on anomaly-based IDSs, showing the main techniques that can be leveraged to detect anomalies and we present machine learning techniques and their application in the context of an IDS, describing the expected advantages and disadvantages that a specific technique could cause.

Pietro Spadaccino

2 Papers