Shisha: Online scheduling of CNN pipelines on heterogeneous architectures
This addresses the challenge of optimizing CNN workloads for modern chiplet-based systems, offering a significant speedup in scheduling for domain-specific applications.
The paper tackles the problem of scheduling CNN pipelines on heterogeneous chiplet architectures, proposing Shisha, an online approach that improves convergence time by ~35x compared to other algorithms while often achieving better solutions.
Chiplets have become a common methodology in modern chip design. Chiplets improve yield and enable heterogeneity at the level of cores, memory subsystem and the interconnect. Convolutional Neural Networks (CNNs) have high computational, bandwidth and memory capacity requirements owing to the increasingly large amount of weights. Thus to exploit chiplet-based architectures, CNNs must be optimized in terms of scheduling and workload distribution among computing resources. We propose Shisha, an online approach to generate and schedule parallel CNN pipelines on chiplet architectures. Shisha targets heterogeneity in compute performance and memory bandwidth and tunes the pipeline schedule through a fast online exploration technique. We compare Shisha with Simulated Annealing, Hill Climbing and Pipe-Search. On average, the convergence time is improved by ~35x in Shisha compared to other exploration algorithms. Despite the quick exploration, Shisha's solution is often better than that of other heuristic exploration algorithms.