LGNov 2, 2020

IOS: Inter-Operator Scheduler for CNN Acceleration

arXiv:2011.01302v20.1086 citationsHas Code
AI Analysis55

This addresses the problem of underutilized hardware parallelism in deep learning frameworks, especially for smaller batch sizes, with incremental improvements in scheduling efficiency.

The paper tackles the performance gap in CNN inference by optimizing inter-operator parallelization, achieving a 1.1 to 1.5x speedup over state-of-the-art libraries like TensorRT on modern benchmarks.

To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-operator parallelization. However, a single operator can no longer fully utilize the available parallelism given the rapid advances in high-performance hardware, resulting in a large gap between the peak performance and the real performance. This performance gap is more severe under smaller batch sizes. In this work, we extensively study the parallelism between operators and propose Inter-Operator Scheduler (IOS) to automatically schedule multiple operators' parallel execution through a novel dynamic programming algorithm. IOS consistently outperforms state-of-the-art libraries (e.g., TensorRT) by 1.1 to 1.5x on modern CNN benchmarks. The code to reproduce each experiment is available at: https://github.com/mit-han-lab/inter-operator-scheduler.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes