TCN Mapping Optimization for Ultra-Low Power Time-Series Edge Inference
This work addresses the challenge of efficient deep learning deployment on resource-constrained edge microcontrollers for time-series applications, representing an incremental improvement with specific hardware optimizations.
The paper tackles the problem of optimizing Temporal Convolutional Networks (TCNs) for time-series inference on ultra-low power edge devices by introducing an automated exploration approach and a library of optimized kernels, achieving up to 103x lower latency and 26.6x lower energy compared to existing methods.
Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series analysis. We introduce an automated exploration approach and a library of optimized kernels to map TCNs on Parallel Ultra-Low Power (PULP) microcontrollers. Our approach minimizes latency and energy by exploiting a layer tiling optimizer to jointly find the tiling dimensions and select among alternative implementations of the causal and dilated 1D-convolution operations at the core of TCNs. We benchmark our approach on a commercial PULP device, achieving up to 103X lower latency and 20.3X lower energy than the Cube-AI toolkit executed on the STM32L4 and from 2.9X to 26.6X lower energy compared to commercial closed-source and academic open-source approaches on the same hardware target.