SE LGDec 26, 2021

Silent Bugs in Deep Learning Frameworks: An Empirical Study of Keras and TensorFlow

Florian Tambon, Amin Nikanjam, Le An, Foutse Khomh, Giuliano Antoniol

arXiv:2112.13314v224.161 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of hidden bugs in deep learning frameworks for developers, providing empirical data and guidelines to improve reliability.

This paper conducted the first empirical study of silent bugs in Keras and TensorFlow, identifying 77 reproducible silent bugs from 1,168 GitHub issues and categorizing them by their impact on users' programs.

Deep Learning (DL) frameworks are now widely used, simplifying the creation of complex models as well as their integration to various applications even to non DL experts. However, like any other programs, they are prone to bugs. This paper deals with the subcategory of bugs named silent bugs: they lead to wrong behavior but they do not cause system crashes or hangs, nor show an error message to the user. Such bugs are even more dangerous in DL applications and frameworks due to the "black-box" and stochastic nature of the systems (the end user can not understand how the model makes decisions). This paper presents the first empirical study of Keras and TensorFlow silent bugs, and their impact on users' programs. We extracted closed issues related to Keras from the TensorFlow GitHub repository. Out of the 1,168 issues that we gathered, 77 were reproducible silent bugs affecting users' programs. We categorized the bugs based on the effects on the users' programs and the components where the issues occurred, using information from the issue reports. We then derived a threat level for each of the issues, based on the impact they had on the users' programs. To assess the relevance of identified categories and the impact scale, we conducted an online survey with 103 DL developers. The participants generally agreed with the significant impact of silent bugs in DL libraries and acknowledged our findings (i.e., categories of silent bugs and the proposed impact scale). Finally, leveraging our analysis, we provide a set of guidelines to facilitate safeguarding against such bugs in DL frameworks.

View on arXiv PDF Code

Similar