AR AI DCDec 14, 2023

Inter-Layer Scheduling Space Exploration for Multi-model Inference on Heterogeneous Chiplets

Mohanad Odema, Hyoukjun Kwon, Mohammad Abdullah Al Faruque

arXiv:2312.09401v11.21 citationsh-index: 40

Originality Incremental advance

AI Analysis

This addresses the compute demands of multi-model workloads like large language models for AI hardware designers, but is incremental as it builds on existing chiplet and scheduling concepts.

The paper tackles the problem of scheduling multi-model inference on heterogeneous chiplet-based accelerators, achieving up to 2.2x throughput and 1.9x energy efficiency improvements over a monolithic accelerator.

To address increasing compute demand from recent multi-model workloads with heavy models like large language models, we propose to deploy heterogeneous chiplet-based multi-chip module (MCM)-based accelerators. We develop an advanced scheduling framework for heterogeneous MCM accelerators that comprehensively consider complex heterogeneity and inter-chiplet pipelining. Our experiments using our framework on GPT-2 and ResNet-50 models on a 4-chiplet system have shown upto 2.2x and 1.9x increase in throughput and energy efficiency, compared to a monolithic accelerator with an optimized output-stationary dataflow.

View on arXiv PDF

Similar