Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications
This work addresses the problem of deploying efficient machine learning models on highly constrained IoT devices, representing an incremental improvement over existing compression techniques.
The paper tackles the challenge of running machine learning on IoT devices with power and storage constraints by proposing a hybrid neural-tree network with ternary quantization, achieving reductions of 11.1% in computations, 52.2% in model size, and 30.6% in memory footprint over a state-of-the-art keyword-spotting network with negligible accuracy loss.
Machine learning-based applications are increasingly prevalent in IoT devices. The power and storage constraints of these devices make it particularly challenging to run modern neural networks, limiting the number of new applications that can be deployed on an IoT system. A number of compression techniques have been proposed, each with its own trade-offs. We propose a hybrid network which combines the strengths of current neural- and tree-based learning techniques in conjunction with ternary quantization, and show a detailed analysis of the associated model design space. Using this hybrid model we obtained a 11.1% reduction in the number of computations, a 52.2% reduction in the model size, and a 30.6% reduction in the overall memory footprint over a state-of-the-art keyword-spotting neural network, with negligible loss in accuracy.