A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices
This addresses the problem of suboptimal performance in deep learning on resource-constrained IoT devices, offering a practical solution for embedded systems, though it appears incremental by building on existing co-design concepts.
The paper tackles the challenge of optimizing deep learning for IoT devices by proposing a bi-directional co-design approach that jointly optimizes DNN models and deployment configurations on FPGAs, achieving state-of-the-art results in accuracy (IoU), throughput (FPS), and energy efficiency.
Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption. Existing approaches typically separate the DNN model development step from its deployment on IoT devices, resulting in suboptimal solutions. In this paper, we first introduce a few interesting but counterintuitive observations about such a separate design approach, and empirically show why it may lead to suboptimal designs. Motivated by these observations, we then propose a novel and practical bi-directional co-design approach: a bottom-up DNN model design strategy together with a top-down flow for DNN accelerator design. It enables a joint optimization of both DNN models and their deployment configurations on IoT devices as represented as FPGAs. We demonstrate the effectiveness of the proposed co-design approach on a real-life object detection application using Pynq-Z1 embedded FPGA. Our method obtains the state-of-the-art results on both QoR with high accuracy (IoU) and QoS with high throughput (FPS) and high energy efficiency.