LGMar 24, 2022
Repairing Group-Level Errors for DNNs Using Weighted RegularizationZiyuan Zhong, Yuchi Tian, Conor J. Sweeney et al.
Deep Neural Networks (DNNs) have been widely used in software making decisions impacting people's lives. However, they have been found to exhibit severe erroneous behaviors that may lead to unfortunate outcomes. Previous work shows that such misbehaviors often occur due to class property violations rather than errors on a single image. Although methods for detecting such errors have been proposed, fixing them has not been studied so far. Here, we propose a generic method called Weighted Regularization (WR) consisting of five concrete methods targeting the error-producing classes to fix the DNNs. In particular, it can repair confusion error and bias error of DNN models for both single-label and multi-label image classifications. A confusion error happens when a given DNN model tends to confuse between two classes. Each method in WR assigns more weights at a stage of DNN retraining or inference to mitigate the confusion between target pair. A bias error can be fixed similarly. We evaluate and compare the proposed methods along with baselines on six widely-used datasets and architecture combinations. The results suggest that WR methods have different trade-offs but under each setting at least one WR method can greatly reduce confusion/bias errors at a very limited cost of the overall performance.
SEMar 30, 2020Code
Code Prediction by Feeding Trees to TransformersSeohyun Kim, Jinman Zhao, Yuchi Tian et al.
We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. First, we report that using the recently proposed Transformer architecture even out-of-the-box outperforms previous neural and non-neural systems for code prediction. We then show that by making the Transformer architecture aware of the syntactic structure of code, we further increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of an RNN-based system (similar to Hellendoorn et al. 2018) by 18.3%, the Deep3 system (Raychev et al 2016) by 14.1%, and an adaptation of Code2Seq (Alon et al., 2018) for code prediction by 14.4%. We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a Facebook internal Python corpus. Our code and data preparation pipeline will be available in open source.
SEOct 9, 2020
Understanding Local Robustness of Deep Neural Networks under Natural VariationsZiyuan Zhong, Yuchi Tian, Baishakhi Ray
Deep Neural Networks (DNNs) are being deployed in a wide range of settings today, from safety-critical applications like autonomous driving to commercial applications involving image classifications. However, recent research has shown that DNNs can be brittle to even slight variations of the input data. Therefore, rigorous testing of DNNs has gained widespread attention. While DNN robustness under norm-bound perturbation got significant attention over the past few years, our knowledge is still limited when natural variants of the input images come. These natural variants, e.g. a rotated or a rainy version of the original input, are especially concerning as they can occur naturally in the field without any active adversary and may lead to undesirable consequences. Thus, it is important to identify the inputs whose small variations may lead to erroneous DNN behaviors. The very few studies that looked at DNN's robustness under natural variants, however, focus on estimating the overall robustness of DNNs across all the test data rather than localizing such error-producing points. This work aims to bridge this gap. To this end, we study the local per-input robustness properties of the DNNs and leverage those properties to build a white-box (DeepRobust-W) and a black-box (DeepRobust-B) tool to automatically identify the non-robust points. Our evaluation of these methods on three DNN models spanning three widely used image classification datasets shows that they are effective in flagging points of poor robustness. In particular, DeepRobust-W and DeepRobust-B are able to achieve an F1 score of up to 91.4% and 99.1%, respectively. We further show that DeepRobust-W can be applied to a regression problem in another domain. Our evaluation on three self-driving car models demonstrates that DeepRobust-W is effective in identifying points of poor robustness with F1 score up to 78.9%.
SEMay 20, 2019
Testing DNN Image Classifiers for Confusion & Bias ErrorsYuchi Tian, Ziyuan Zhong, Vicente Ordonez et al.
Image classifiers are an important component of today's software, from consumer and business applications to safety-critical domains. The advent of Deep Neural Networks (DNNs) is the key catalyst behind such wide-spread success. However, wide adoption comes with serious concerns about the robustness of software systems dependent on DNNs for image classification, as several severe erroneous behaviors have been reported under sensitive and critical circumstances. We argue that developers need to rigorously test their software's image classifiers and delay deployment until acceptable. We present an approach to testing image classifier robustness based on class property violations. We found that many of the reported erroneous cases in popular DNN image classifiers occur because the trained models confuse one class with another or show biases towards some classes over others. These bugs usually violate some class properties of one or more of those classes. Most DNN testing techniques focus on per-image violations, so fail to detect class-level confusions or biases. We developed a testing technique to automatically detect class-based confusion and bias errors in DNN-driven image classification software. We evaluated our implementation, DeepInspect, on several popular image classifiers with precision up to 100% (avg.~72.6%) for confusion errors, and up to 84.3% (avg.~66.8%) for bias errors. DeepInspect found hundreds of classification mistakes in widely-used models, many exposing errors indicating confusion or bias.
SEAug 28, 2017
DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous CarsYuchi Tian, Kexin Pei, Suman Jana et al.
Recent advances in Deep Neural Networks (DNNs) have led to the development of DNN-driven autonomous cars that, using sensors like camera, LiDAR, etc., can drive without any human intervention. Most major manufacturers including Tesla, GM, Ford, BMW, and Waymo/Google are working on building and testing different types of autonomous vehicles. The lawmakers of several US states including California, Texas, and New York have passed new legislation to fast-track the process of testing and deployment of autonomous vehicles on their roads. However, despite their spectacular progress, DNNs, just like traditional software, often demonstrate incorrect or unexpected corner case behaviors that can lead to potentially fatal collisions. Several such real-world accidents involving autonomous cars have already happened including one which resulted in a fatality. Most existing testing techniques for DNN-driven vehicles are heavily dependent on the manual collection of test data under different driving conditions which become prohibitively expensive as the number of test conditions increases. In this paper, we design, implement and evaluate DeepTest, a systematic testing tool for automatically detecting erroneous behaviors of DNN-driven vehicles that can potentially lead to fatal crashes. First, our tool is designed to automatically generated test cases leveraging real-world changes in driving conditions like rain, fog, lighting conditions, etc. DeepTest systematically explores different parts of the DNN logic by generating test inputs that maximize the numbers of activated neurons. DeepTest found thousands of erroneous behaviors under different realistic driving conditions (e.g., blurring, rain, fog, etc.) many of which lead to potentially fatal crashes in three top performing DNNs in the Udacity self-driving car challenge.