Taxonomy of Real Faults in Deep Learning Systems
This work addresses the need for systematic fault analysis in deep learning systems, particularly for safety-critical domains, by providing a comprehensive taxonomy that is validated through empirical data, making it an incremental contribution to software engineering for AI.
The authors tackled the problem of understanding faults in deep learning systems by creating a large taxonomy based on analysis of 1059 artefacts from GitHub and Stack Overflow, plus interviews with 20 experts, and validated it with a survey of 21 developers where 13 out of 15 fault categories were experienced by at least 50% of participants.
The growing application of deep neural networks in safety-critical domains makes the analysis of faults that occur in such systems of enormous importance. In this paper we introduce a large taxonomy of faults in deep learning (DL) systems. We have manually analysed 1059 artefacts gathered from GitHub commits and issues of projects that use the most popular DL frameworks (TensorFlow, Keras and PyTorch) and from related Stack Overflow posts. Structured interviews with 20 researchers and practitioners describing the problems they have encountered in their experience have enriched our taxonomy with a variety of additional faults that did not emerge from the other two sources. Our final taxonomy was validated with a survey involving an additional set of 21 developers, confirming that almost all fault categories (13/15) were experienced by at least 50% of the survey participants.