A 1-D CNN inference engine for constrained platforms
This work addresses latency and memory constraints for time-critical tasks like sample acquisition on single-threaded edge devices, representing an incremental improvement.
The paper tackled the problem of inference latency and memory usage for 1D-CNNs on constrained edge devices by proposing an inference scheme that interleaves convolution operations between sample intervals, resulting in a 10% reduction in inference delay and almost halved memory usage compared to TFLite's method.
1D-CNNs are used for time series classification in various domains with a high degree of accuracy. Most implementations collect the incoming data samples in a buffer before performing inference on it. On edge devices, which are typically constrained and single-threaded, such an implementation may interfere with time-critical tasks. One such task is that of sample acquisition. In this work, we propose an inference scheme that interleaves the convolution operations between sample intervals, which allows us to reduce the inference latency. Furthermore, our scheme is well-suited for storing data in ring buffers, yielding a small memory footprint. We demonstrate these improvements by comparing our approach to TFLite's inference method, giving a 10% reduction in the inference delay while almost halving the memory usage. Our approach is feasible on common consumer devices, which we show using an AVR-based Arduino board and an ARM-based Arduino board.