LG ARApr 13

Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores

Yixian Shen, Chaoyao Shen, Jan Deen, George Floros, Andy Pimentel, Anuj Pathania

arXiv:2604.1194867.71 citationsh-index: 17

Predicted impact top 35% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For system architects and engineers deploying LFMs on emerging 3D-stacked CPU architectures, this work addresses the critical challenge of thermal-aware scheduling under kernel diversity and system heterogeneity.

The paper tackles thermal and performance management for Large Foundation Model inference on 3D S-NUCA many-core CPUs. AILFM uses active imitation learning to learn near-optimal scheduling policies, outperforming state-of-the-art baselines in thermal safety and performance.

Large Foundation Model (LFM) inference is both memory- and compute-intensive, traditionally relying on GPUs. However, the limited availability and high cost have motivated the adoption of high-performance general-purpose CPUs, especially emerging 3D-stacked Static Non-Uniform Cache Architecture (3D S-NUCA) systems. These architectures offer enhanced bandwidth and locality but suffer from severe thermal challenges and uneven cache latencies due to 3D Networks-on-Chip (NoC). Optimal management of thread migration and V/f scaling is non-trivial due to LFM kernel diversity and system heterogeneity. Existing thermal management approaches often rely on oversimplified analytical models and lack adaptability. We propose AILFM, an Active Imitation Learning (AIL)-based scheduling framework that learns near-optimal thermal-aware scheduling policies from Oracle demonstrations with minimal run-time overhead. AILFM accounts for both core-level performance heterogeneity and kernel-specific behavior in LFMs to maintain thermal safety while maximizing performance. Extensive experiments show that AILFM outperforms state-of-the-art baselines and generalizes well across diverse LFM workloads.

View on arXiv PDF

Similar