Nikolaos Georgantas

LG
h-index2
4papers
99citations
Novelty28%
AI Score37

4 Papers

LGDec 21, 2025
ML Inference Scheduling with Predictable Latency

Haidong Zhao, Nikolaos Georgantas

Machine learning (ML) inference serving systems can schedule requests to improve GPU utilization and to meet service level objectives (SLOs) or deadlines. However, improving GPU utilization may compromise latency-sensitive scheduling, as concurrent tasks contend for GPU resources and thereby introduce interference. Given that interference effects introduce unpredictability in scheduling, neglecting them may compromise SLO or deadline satisfaction. Nevertheless, existing interference prediction approaches remain limited in several respects, which may restrict their usefulness for scheduling. First, they are often coarse-grained, which ignores runtime co-location dynamics and thus restricts their accuracy in interference prediction. Second, they tend to use a static prediction model, which may not effectively cope with different workload characteristics. To this end, we evaluate the potential limitations of existing interference prediction approaches and outline our ongoing work toward achieving efficient ML inference scheduling.

14.2LGApr 30
Strait: Perceiving Priority and Interference in ML Inference Serving

Haidong Zhao, Nikolaos Georgantas

Machine learning (ML) inference serving systems host deep neural network (DNN) models and schedule incoming inference requests across deployed GPUs. However, limited support for task prioritization and insufficient latency estimation under concurrent execution may restrict their applicability in on-premises scenarios. We present \emph{Strait}, a serving system designed to enhance deadline satisfaction for dual-priority inference traffic under high GPU utilization. To improve latency estimation, Strait models potential contention during data transfer and accounts for kernel execution interference through an adaptive prediction model. By drawing on these predictions, it performs priority-aware scheduling to deliver differentiated handling. Evaluation results under intense workloads suggest that Strait reduces deadline violations for high-priority tasks by 1.02 to 11.18 percentage points while incurring acceptable costs on low-priority tasks. Compared to software-defined preemption approaches, Strait also exhibits more equitable performance.

AIJul 1, 2017
A study of existing Ontologies in the IoT-domain

Garvita Bajaj, Rachit Agarwal, Pushpendra Singh et al.

Several domains have adopted the increasing use of IoT-based devices to collect sensor data for generating abstractions and perceptions of the real world. This sensor data is multi-modal and heterogeneous in nature. This heterogeneity induces interoperability issues while developing cross-domain applications, thereby restricting the possibility of reusing sensor data to develop new applications. As a solution to this, semantic approaches have been proposed in the literature to tackle problems related to interoperability of sensor data. Several ontologies have been proposed to handle different aspects of IoT-based sensor data collection, ranging from discovering the IoT sensors for data collection to applying reasoning on the collected sensor data for drawing inferences. In this paper, we survey these existing semantic ontologies to provide an overview of the recent developments in this field. We highlight the fundamental ontological concepts (e.g., sensor-capabilities and context-awareness) required for an IoT-based application, and survey the existing ontologies which include these concepts. Based on our study, we also identify the shortcomings of currently available ontologies, which serves as a stepping stone to state the need for a common unified ontology for the IoT domain.

SENov 28, 2016
Multi-Objective Service Composition in Ubiquitous Environments with Service Dependencies

Nebil Ben Mabrouk, Nikolaos Georgantas, Valérie Issarny

Service composition is a widely used method in ubiquitous computing that enables accomplishing complex tasks required by users based on elementary (hardware and software) services available in ubiquitous environments. To ensure that users experience the best Quality of Service (QoS) with respect to their quality needs, service composition has to be QoS-aware. Establishing QoS-aware service compositions entails efficient service selection taking into account the QoS requirements of users. A challenging issue towards this purpose is to consider service selection under global QoS requirements (i.e., requirements imposed by the user on the whole task), which is of high computational cost. This challenge is even more relevant when we consider the dynamics, limited computational resources and timeliness constraints of ubiquitous environments. To cope with the above challenge, we present QASSA, an efficient service selection algorithm that provides the appropriate ground for QoS-aware service composition in ubiquitous environments. QASSA formulates service selection under global QoS requirements as a set-based optimisation problem, and solves this problem by combining local and global selection techniques. In particular, it introduces a novel way of using clustering techniques to enable fine-grained management of trade-offs between QoS objectives. QASSA further considers: (i) dependencies between services, (ii) adaptation at run-time, and (iii) both centralised and distributed design fashions. Results of experimental studies performed using real QoS data are presented to illustrate the timeliness and optimality of QASSA.