Mohammad Pivezhandi

h-index2

5papers

12citations

Novelty56%

AI Score39

Ranked #77,052 of 194,257 authors (top 40%)#17,192 in LG (top 43%)

5 Papers

4.7DCJul 8

HiDVFS: Hierarchical Multi-Agent DVFS for Real-Time OpenMP DAG Workloads

Mohammad Pivezhandi, Abusayeed Saifullah, Ali Jannesari

Leakage power in multicore embedded systems now rivals dynamic power, so DVFS schedulers must respect deadlines and thermal limits, not just average makespan. Existing heuristics lack per-core, temperature-aware control and overlook the irregular execution of OpenMP DAGs. We propose HiDVFS, a general, extensible hierarchical multi-agent DVFS scheduler: a profiler agent selects cores and frequencies, a thermal agent groups cores by temperature, and a priority agent orders tasks under contention, all trained with a makespan-focused reward using short-horizon future-state shaping for sample efficiency. Deadlines are soft, derived from a measured reference cost; a federated schedulability gate keeps operating points feasible, and a calibrated split-conformal shield bounds each action's predicted response time. On Jetson TX2 with multi-seed validation, HiDVFS attains a 4.16+/-0.58 s L10 makespan, a 2.83x speedup and 32.9% energy reduction over a fairness-corrected GearDVFS port, and a 4.62x average speedup with 55.7% energy reduction across all 12 BOTS benchmarks. Cross-platform results on TX2, Orin NX, and RubikPi show deadline-aware DVFS cuts energy 15 to 18% versus pinning the maximum frequency, and a measured mixed-criticality study shows cluster-aware reservation is required to keep a high-criticality task's deadline-miss ratio at zero.

6.4LGSep 21, 2024

FlowRL: Flow-Augmented Few-Shot Reinforcement Learning for Semi-Structured Sensor Data

Mohammad Pivezhandi, Abusayeed Saifullah

Reinforcement learning (RL) in few-shot scenarios with limited sensor data is challenging due to insufficient training samples, particularly in applications like Dynamic Voltage and Frequency Scaling (DVFS) where sensor readings are semi-structured with inherent correlations. We propose Flow-Augmented Reinforcement Learning (FlowRL), a novel method that leverages continuous normalizing flows to generate high-quality synthetic data for few-shot RL. By integrating latent space bootstrapping for diversity and feature-weighted flow matching to preserve critical data correlations, FlowRL enhances sample efficiency and policy robustness. Evaluated on a DVFS case study using the NVIDIA Jetson TX2, our approach achieves up to 35\% higher frame rates and faster Q-value convergence compared to baselines, demonstrating its effectiveness in resource-constrained environments. FlowRL generalizes to other semi-structured domains, such as robotics and smart grids, offering a scalable solution for data-scarce RL settings.

11.4LGDec 12, 2025

GraphPerf-RT: A Graph-Driven Performance Model for Hardware-Aware Scheduling of OpenMP Codes

Mohammad Pivezhandi, Mahdi Banisharif, Saeed Bakhshan et al.

Autonomous AI agents on embedded platforms require real-time, risk-aware scheduling under resource and thermal constraints. Classical heuristics struggle with workload irregularity, tabular regressors discard structural information, and model-free reinforcement learning (RL) risks overheating. We introduce GraphPerf-RT, a graph neural network surrogate achieving deep learning accuracy at heuristic speeds (2-7ms). GraphPerf-RT is, to our knowledge, the first to unify task DAG topology, CFG-derived code semantics, and runtime context (per-core DVFS, thermal state, utilization) in a heterogeneous graph with typed edges encoding precedence, placement, and contention. Evidential regression with Normal-Inverse-Gamma priors provides calibrated uncertainty; we validate on makespan prediction for risk-aware scheduling. Experiments on three ARM platforms (Jetson TX2, Orin NX, RUBIK Pi) achieve R^2 = 0.81 on log-transformed makespan with Spearman rho = 0.95 and conservative uncertainty calibration (PICP = 99.9% at 95% confidence). Integration with four RL methods demonstrates that multi-agent model-based RL with GraphPerf-RT as the world model achieves 66% makespan reduction and 82% energy reduction versus model-free baselines, with zero thermal violations.

6.0AIJan 13

ZeroDVFS: Zero-Shot LLM-Guided Core and Frequency Allocation for Embedded Platforms

Mohammad Pivezhandi, Mahdi Banisharif, Abusayeed Saifullah et al.

Dynamic voltage and frequency scaling (DVFS) and task-to-core allocation are critical for thermal management and balancing energy and performance in embedded systems. Existing approaches either rely on utilization-based heuristics that overlook stall times, or require extensive offline profiling for table generation, preventing runtime adaptation. We propose a model-based hierarchical multi-agent reinforcement learning (MARL) framework for thermal- and energy-aware scheduling on multi-core platforms. Two collaborative agents decompose the exponential action space, achieving 358ms latency for subsequent decisions. First decisions require 3.5 to 8.0s including one-time LLM feature extraction. An accurate environment model leverages regression techniques to predict thermal dynamics and performance states. When combined with LLM-extracted semantic features, the environment model enables zero-shot deployment for new workloads on trained platforms by generating synthetic training data without requiring workload-specific profiling samples. We introduce LLM-based semantic feature extraction that characterizes OpenMP programs through 13 code-level features without execution. The Dyna-Q-inspired framework integrates direct reinforcement learning with model-based planning, achieving 20x faster convergence than model-free methods. Experiments on BOTS and PolybenchC benchmarks across NVIDIA Jetson TX2, Jetson Orin NX, RubikPi, and Intel Core i7 demonstrate 7.09x better energy efficiency and 4.0x better makespan than Linux ondemand governor. First-decision latency is 8,300x faster than table-based profiling, enabling practical deployment in dynamic embedded systems.

3.3DCJan 26, 2025

A Statistical Learning Approach for Feature-Aware Task-to-Core Allocation in Heterogeneous Platforms

Mohammad Pivezhandi, Abusayeed Saifullah, Prashant Modekurthy

Optimizing task-to-core allocation can substantially reduce power consumption in multi-core platforms without degrading user experience. However, many existing approaches overlook critical factors such as parallelism, compute intensity, and heterogeneous core types. In this paper, we introduce a statistical learning approach for feature selection that identifies the most influential features - such as core type, speed, temperature, and application-level parallelism or memory intensity - for accurate environment modeling and efficient energy optimization. Our experiments, conducted with state-of-the-art Linux governors and thermal modeling techniques, show that correlation-aware task-to-core allocation lowers energy consumption by up to 10% and reduces core temperature by up to 5 degrees Celsius compared to random core selection. Furthermore, our compressed, bootstrapped regression model improves thermal prediction accuracy by 6% while cutting model parameters by 16%, yielding an overall mean square error reduction of 61.6% relative to existing approaches. We provided results based on superscalar Intel Core i7 12th Gen processors with 14 cores, but validated our method across a diverse set of hardware platforms and effectively balanced performance, power, and thermal demands through statistical feature evaluation.