SEAug 13, 2020

Graph-Based Fuzz Testing for Deep Learning Inference Engine

arXiv:2008.05933v245 citations
AI Analysis

This addresses quality assurance for deep learning systems, focusing on inference engines rather than models, which is an incremental but specific improvement for developers and users of DL frameworks.

The paper tackles the problem of testing deep learning inference engines by proposing a graph-based fuzz testing method, which discovered over 40 exceptions and improved operator-level coverage by up to 8.2% on average compared to random methods.

With the wide use of Deep Learning (DL) systems, academy and industry begin to pay attention to their quality. Testing is one of the major methods of quality assurance. However, existing testing techniques focus on the quality of DL models but lacks attention to the core underlying inference engines (i.e., frameworks and libraries). Inspired by the success stories of fuzz testing, we design a graph-based fuzz testing method to improve the quality of DL inference engines. This method is naturally followed by the graph structure of DL models. A novel operator-level coverage criterion based on graph theory is introduced and six different mutations are implemented to generate diversified DL models by exploring combinations of model structures, parameters, and data inputs. The Monte Carlo Tree Search (MCTS) is used to drive DL model generation without a training process. The experimental results show that the MCTS outperforms the random method in boosting operator-level coverage and detecting exceptions. Our method has discovered more than 40 different exceptions in three types of undesired behaviors: model conversion failure, inference failure, output comparison failure. The mutation strategies are useful to generate new valid test inputs, by up to 8.2% more operator-level coverage on average and 8.6 more exceptions captured.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes