LG SEJul 26, 2022

NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers

Jiawei Liu, Jinkun Lin, Fabian Ruffy, Cheng Tan, Jinyang Li, Aurojit Panda, Lingming Zhang

arXiv:2207.13066v219.2119 citationsh-index: 49Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of ensuring correctness in DL compilers, which is critical for reliable AI applications, though it is an incremental improvement in testing methods.

The authors tackled the problem of finding bugs in deep-learning compilers like TVM and TensorRT, which can cause incorrect model outputs, by proposing NNSmith, a fuzz testing approach that generated diverse test models and used gradient-based search and differential testing, resulting in 72 new bugs found, with 58 confirmed and 51 fixed.

Deep-learning (DL) compilers such as TVM and TensorRT are increasingly being used to optimize deep neural network (DNN) models to meet performance, resource utilization and other requirements. Bugs in these compilers can result in models whose semantics differ from the original ones, producing incorrect results that corrupt the correctness of downstream applications. However, finding bugs in these compilers is challenging due to their complexity. In this work, we propose a new fuzz testing approach for finding bugs in deep-learning compilers. Our core approach consists of (i) generating diverse yet valid DNN test models that can exercise a large part of the compiler's transformation logic using light-weight operator specifications; (ii) performing gradient-based search to find model inputs that avoid any floating-point exceptional values during model execution, reducing the chance of missed bugs or false alarms; and (iii) using differential testing to identify bugs. We implemented this approach in NNSmith which has found 72 new bugs for TVM, TensorRT, ONNXRuntime, and PyTorch to date. Of these 58 have been confirmed and 51 have been fixed by their respective project maintainers.

View on arXiv PDF Code

Similar