Nishant Saurabh

DC
h-index14
3papers
11citations
Novelty43%
AI Score39

3 Papers

QUANT-PHApr 3
Hybrid Quantum-HPC Middleware Systems for Adaptive Resource, Workload and Task Management

Pradeep Mantha, Florian J. Kiwit, Nishant Saurabh et al.

Hybrid quantum-classical applications pose significant resource management challenges due to heterogeneity and dynamism in both infrastructure and workloads. Quantum-HPC environments integrate quantum processing units (QPUs) with diverse classical resources (CPUs, GPUs), while applications span coupling patterns from tightly coupled execution to loosely coupled task parallelism with varying resource requirements. Traditional HPC schedulers lack visibility into application semantics and cannot respond to fluctuating resource availability at runtime. This paper presents a middleware-based approach for adaptive resource, workload, and task management in hybrid quantum-HPC systems. We make four contributions: (i) a conceptual four-layer middleware architecture that decomposes management across workflow, workload, task, and resource levels, enabling application-aware scheduling over heterogeneous quantum-HPC resources; (ii) a set of execution motifs capturing interaction and coupling characteristics of hybrid applications, realized as quantum mini-apps for systematic workload characterization; (iii) Pilot-Quantum, a middleware framework built on the pilot abstraction that enables late binding and dynamic resource allocation, adapting to resource and workload dynamics at runtime; and (iv) Q-Dreamer, a performance modeling toolkit providing reusable components for informed workload partitioning, including a circuit-cutting optimizer that analytically derives optimal partitioning strategies. Evaluation on heterogeneous HPC platforms (Perlmutter, NVIDIA DGX with H100/B200 GPUs) demonstrates efficient multi-backend orchestration across CPUs, GPUs, and QPUs for diverse execution motifs. Q-Dreamer predicts optimal circuit cutting configurations with up to 82% accuracy.

DCMay 24
Beyond Thread States: Diagnosing Performance Degradation with eBPF and Thread Dynamics

Diogo Landau, Jorge G. Barbosa, Nishant Saurabh

Online Data-Intensive applications face performance degradation from load variability and resource interference. While Thread State Analysis (TSA) based approaches enable identifying constrained subsystems, they lack the granularity to reveal the inter-thread dependencies that propagate degradation. In this paper, we present an application-agnostic performance degradation analysis method that extends TSA by capturing fine-grained thread dynamics. We implemented $16$ eBPF-based metrics across six kernel subsystems, including scheduling, VFS, networking, futex, multiplexing IO, and block IO which enables tracing thread interactions with specific resources like futexes, sockets, and disks. Our method leverages the fact that performance degradation propagates along inter-thread dependencies, and a subset of thread-resource interactions can enable capturing common degradation patterns. To this end, we employ a selective thread tracking algorithm that traces performance issues from entry-point threads to constrained resources. Experimentation with diverse applications under variable workloads and resource contention shows our method successfully diagnoses CPU, disk, lock, and external service contention with minimal overhead, while also revealing internal application constraints.

LGMay 31, 2025
Federated learning framework for collaborative remaining useful life prognostics: an aircraft engine case study

Diogo Landau, Ingeborg de Pater, Mihaela Mitici et al.

Complex systems such as aircraft engines are continuously monitored by sensors. In predictive aircraft maintenance, the collected sensor measurements are used to estimate the health condition and the Remaining Useful Life (RUL) of such systems. However, a major challenge when developing prognostics is the limited number of run-to-failure data samples. This challenge could be overcome if multiple airlines would share their run-to-failure data samples such that sufficient learning can be achieved. Due to privacy concerns, however, airlines are reluctant to share their data in a centralized setting. In this paper, a collaborative federated learning framework is therefore developed instead. Here, several airlines cooperate to train a collective RUL prognostic machine learning model, without the need to centrally share their data. For this, a decentralized validation procedure is proposed to validate the prognostics model without sharing any data. Moreover, sensor data is often noisy and of low quality. This paper therefore proposes four novel methods to aggregate the parameters of the global prognostic model. These methods enhance the robustness of the FL framework against noisy data. The proposed framework is illustrated for training a collaborative RUL prognostic model for aircraft engines, using the N-CMAPSS dataset. Here, six airlines are considered, that collaborate in the FL framework to train a collective RUL prognostic model for their aircraft's engines. When comparing the proposed FL framework with the case where each airline independently develops their own prognostic model, the results show that FL leads to more accurate RUL prognostics for five out of the six airlines. Moreover, the novel robust aggregation methods render the FL framework robust to noisy data samples.