A JIT Compiler for Neural Network Inference
This work addresses runtime efficiency for neural network inference, particularly on resource-constrained platforms like NAO V6, though it is incremental as it builds on prior compiler techniques.
The authors developed a just-in-time (JIT) compiler for neural network inference that generates optimized machine code at runtime, achieving significant performance improvements over existing implementations on small networks but performing worse on large ones.
This paper describes a C++ library that compiles neural network models at runtime into machine code that performs inference. This approach in general promises to achieve the best performance possible since it is able to integrate statically known properties of the network directly into the code. In our experiments on the NAO V6 platform, it outperforms existing implementations significantly on small networks, while being inferior on large networks. The library was already part of the B-Human code release 2018, but has been extended since and is now available as a standalone version that can be integrated into any C++14 code base.