Improvements in Interlayer Pipelining of CNN Accelerators Using Genetic Algorithms
This work addresses performance and efficiency bottlenecks for deploying CNNs on edge devices, representing an incremental improvement in hardware acceleration methods.
The paper tackled the problem of inefficient data movement in CNN accelerators for edge platforms by developing a layer fusion technique using a Genetic Algorithm, resulting in a 1.8x increase in energy efficiency and up to 1.9x improvement in energy-delay product for MobileNet-v3.
Deploying Convolutional Neural Networks (CNNs) on edge platforms necessitates efficient hardware acceleration. Any unnecessary data movement in such accelerators can unacceptably degrade performance and efficiency. To address this, we develop a layer fusion technique targeting CNNs, that reduces off-chip data communication using a Genetic Algorithm (GA) applied to graph-based topological sort. Results show a 1.8$\times$ increase in energy efficiency and 1.9$\times$ improvement in energy-delay product (EDP) for MobileNet-v3 on a SIMBA-like mobile architecture. Our approach consistently improves workload performance, averaging 1.4$\times$ improvement to EDP for SIMBA and 1.12$\times$ for Eyeriss.