Arithmetic Intensity Balancing Convolution for Hardware-aware Efficient Block Design
This work addresses latency reduction for edge devices and lightweight neural networks, but it appears incremental as it builds on existing models like MobileNetV1 and ResNet50.
The paper tackled the problem of latency in AI accelerators by proposing arithmetic intensity balancing convolution (ABConv) to enhance hardware performance without accuracy loss, achieving significant latency reduction on an Arm Ethos-U65 NPU in tests with MobileNetV1 and ResNet50 on CIFAR100.
As deep learning advances, edge devices and lightweight neural networks are becoming more important. To reduce latency in the AI accelerator, it's essential to not only reduce FLOPs but also enhance hardware performance. We proposed an arithmetic intensity balancing convolution (ABConv) to address the issue of the overall intensity being limited by the small weight arithmetic intensity for convolution with a small spatial size. ABConv increased the maximum bound of overall arithmetic intensity and significantly reduced latency, without sacrificing accuracy. We tested the latency and hardware performance of ABConv on the Arm Ethos-U65 NPU in various configurations and used it to replace some of MobileNetV1 and ResNet50 in image classification for CIFAR100.