LG ARJul 27, 2021

A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs

Hiroki Kawakami, Hirohisa Watanabe, Keisuke Sugiura, Hiroki Matsutani

arXiv:2107.12824v44.46 citations

Originality Incremental advance

AI Analysis

This work addresses resource constraints for edge computing applications, but it is incremental as it builds on existing parameter reduction techniques.

The paper tackles the challenge of deploying deep neural networks on edge devices with limited computational resources by proposing dsODENet, a compact model combining Neural ODE and Depthwise Separable Convolution, which reduces parameters by 54.2% to 79.8% and achieves a 23.8x inference speedup on FPGA.

High-performance deep neural network (DNN)-based systems are in high demand in edge environments. Due to its high computational complexity, it is challenging to deploy DNNs on edge devices with strict limitations on computational resources. In this paper, we derive a compact while highly-accurate DNN model, termed dsODENet, by combining recently-proposed parameter reduction techniques: Neural ODE (Ordinary Differential Equation) and DSC (Depthwise Separable Convolution). Neural ODE exploits a similarity between ResNet and ODE, and shares most of weight parameters among multiple layers, which greatly reduces the memory consumption. We apply dsODENet to a domain adaptation as a practical use case with image classification datasets. We also propose a resource-efficient FPGA-based design for dsODENet, where all the parameters and feature maps except for pre- and post-processing layers can be mapped onto on-chip memories. It is implemented on Xilinx ZCU104 board and evaluated in terms of domain adaptation accuracy, inference speed, FPGA resource utilization, and speedup rate compared to a software counterpart. The results demonstrate that dsODENet achieves comparable or slightly better domain adaptation accuracy compared to our baseline Neural ODE implementation, while the total parameter size without pre- and post-processing layers is reduced by 54.2% to 79.8%. Our FPGA implementation accelerates the inference speed by 23.8 times.

View on arXiv PDF

Similar