LGJan 24, 2021

Analytical Characterization and Design Space Exploration for Optimization of CNNs

arXiv:2101.09808v25 citations
AI Analysis

This work addresses the bottleneck of data movement in CNNs for machine learning practitioners, though it is incremental as it builds on existing loop-level optimization techniques.

The paper tackles the problem of optimizing convolutional neural networks (CNNs) on multi-core CPUs by developing an analytical modeling approach to find the best loop-level optimization configurations, achieving comparable or better performance than state-of-the-art libraries and auto-tuning optimizers.

Moving data through the memory hierarchy is a fundamental bottleneck that can limit the performance of core algorithms of machine learning, such as convolutional neural networks (CNNs). Loop-level optimization, including loop tiling and loop permutation, are fundamental transformations to reduce data movement. However, the search space for finding the best loop-level optimization configuration is explosively large. This paper develops an analytical modeling approach for finding the best loop-level optimization configuration for CNNs on multi-core CPUs. Experimental evaluation shows that this approach achieves comparable or better performance than state-of-the-art libraries and auto-tuning based optimizers for CNNs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes