James Sharp

7papers

493citations

Novelty39%

AI Score28

Ranked #157,596 of 201,326 authors (top 78%)#34,744 in LG (top 82%)

7 Papers

LGMar 5, 2021Code

Abstraction and Symbolic Execution of Deep Neural Networks with Bayesian Approximation of Hidden Features

Nicolas Berthier, Amany Alshareef, James Sharp et al.

Intensive research has been conducted on the verification and validation of deep neural networks (DNNs), aiming to understand if, and how, DNNs can be applied to safety critical applications. However, existing verification and validation techniques are limited by their scalability, over both the size of the DNN and the size of the dataset. In this paper, we propose a novel abstraction method which abstracts a DNN and a dataset into a Bayesian network (BN). We make use of dimensionality reduction techniques to identify hidden features that have been learned by hidden layers of the DNN, and associate each hidden feature with a node of the BN. On this BN, we can conduct probabilistic inference to understand the behaviours of the DNN processing data. More importantly, we can derive a runtime monitoring approach to detect in operational time rare inputs and covariate shift of the input data. We can also adapt existing structural coverage-guided testing techniques (i.e., based on low-level elements of the DNN such as neurons), in order to generate test cases that better exercise hidden features. We implement and evaluate the BN abstraction technique using our DeepConcolic tool available at https://github.com/TrustAI/DeepConcolic.

NEJun 20, 2019Code

testRNN: Coverage-guided Testing on Recurrent Neural Networks

Wei Huang, Youcheng Sun, Xiaowei Huang et al.

Recurrent neural networks (RNNs) have been widely applied to various sequential tasks such as text processing, video recognition, and molecular property prediction. We introduce the first coverage-guided testing tool, coined testRNN, for the verification and validation of a major class of RNNs, long short-term memory networks (LSTMs). The tool implements a generic mutation-based test case generation method, and it empirically evaluates the robustness of a network using three novel LSTM structural test coverage metrics. Moreover, it is able to help the model designer go through the internal data flow processing of the LSTM layer. The tool is available through: https://github.com/TrustAI/testRNN under the BSD 3-Clause licence.

LGMar 10, 2018Code

Testing Deep Neural Networks

Youcheng Sun, Xiaowei Huang, Daniel Kroening et al.

Deep neural networks (DNNs) have a wide range of applications, and software employing them must be thoroughly tested, especially in safety-critical domains. However, traditional software test coverage metrics cannot be applied directly to DNNs. In this paper, inspired by the MC/DC coverage criterion, we propose a family of four novel test criteria that are tailored to structural features of DNNs and their semantics. We validate the criteria by demonstrating that the generated test inputs guided via our proposed coverage criteria are able to capture undesired behaviours in a DNN. Test cases are generated using a symbolic approach and a gradient-based heuristic search. By comparing them with existing methods, we show that our criteria achieve a balance between their ability to find bugs (proxied using adversarial examples) and the computational cost of test case generation. Our experiments are conducted on state-of-the-art DNNs obtained using popular open source datasets, including MNIST, CIFAR-10 and ImageNet.

LGMar 7, 2020

A Safety Framework for Critical Systems Utilising Deep Neural Networks

Xingyu Zhao, Alec Banks, James Sharp et al.

Increasingly sophisticated mathematical modelling processes from Machine Learning are being used to analyse complex data. However, the performance and explainability of these models within practical critical systems requires a rigorous and continuous verification of their safe utilisation. Working towards addressing this challenge, this paper presents a principled novel safety argument framework for critical systems that utilise deep neural networks. The approach allows various forms of predictions, e.g., future reliability of passing some demands, or confidence on a required reliability level. It is supported by a Bayesian analysis using operational data and the recent verification and validation techniques for deep learning. The prediction is conservative -- it starts with partial prior knowledge obtained from lifecycle activities and then determines the worst-case prediction. Open challenges are also identified.

CVFeb 6, 2020

Reliability Validation of Learning Enabled Vehicle Tracking

Youcheng Sun, Yifan Zhou, Simon Maskell et al.

This paper studies the reliability of a real-world learning-enabled system, which conducts dynamic vehicle tracking based on a high-resolution wide-area motion imagery input. The system consists of multiple neural network components -- to process the imagery inputs -- and multiple symbolic (Kalman filter) components -- to analyse the processed information for vehicle tracking. It is known that neural networks suffer from adversarial examples, which make them lack robustness. However, it is unclear if and how the adversarial examples over learning components can affect the overall system-level reliability. By integrating a coverage-guided neural network testing tool, DeepConcolic, with the vehicle tracking system, we found that (1) the overall system can be resilient to some adversarial examples thanks to the existence of other components, and (2) the overall system presents an extra level of uncertainty which cannot be determined by analysing the deep learning components only. This research suggests the need for novel verification and validation methods for learning-enabled systems.

LGNov 5, 2019

Coverage Guided Testing for Recurrent Neural Networks

Wei Huang, Youcheng Sun, Xingyu Zhao et al.

Recurrent neural networks (RNNs) have been applied to a broad range of applications, including natural language processing, drug discovery, and video recognition. Their vulnerability to input perturbation is also known. Aligning with a view from software defect detection, this paper aims to develop a coverage guided testing approach to systematically exploit the internal behaviour of RNNs, with the expectation that such testing can detect defects with high possibility. Technically, the long short term memory network (LSTM), a major class of RNNs, is thoroughly studied. A family of three test metrics are designed to quantify not only the values but also the temporal relations (including both step-wise and bounded-length) exhibited when LSTM processing inputs. A genetic algorithm is applied to efficiently generate test cases. The test metrics and test case generation algorithm are implemented into a tool TestRNN, which is then evaluated on a set of LSTM benchmarks. Experiments confirm that TestRNN has advantages over the state-of-art tool DeepStellar and attack-based defect detection methods, owing to its working with finer temporal semantics and the consideration of the naturalness of input perturbation. Furthermore, TestRNN enables meaningful information to be collected and exhibited for users to understand the testing results, which is an important step towards interpretable neural network testing.

LGDec 18, 2018

A Survey of Safety and Trustworthiness of Deep Neural Networks: Verification, Testing, Adversarial Attack and Defence, and Interpretability

Xiaowei Huang, Daniel Kroening, Wenjie Ruan et al.

In the past few years, significant progress has been made on deep neural networks (DNNs) in achieving human-level performance on several long-standing tasks. With the broader deployment of DNNs on various applications, the concerns over their safety and trustworthiness have been raised in public, especially after the widely reported fatal incidents involving self-driving cars. Research to address these concerns is particularly active, with a significant number of papers released in the past few years. This survey paper conducts a review of the current research effort into making DNNs safe and trustworthy, by focusing on four aspects: verification, testing, adversarial attack and defence, and interpretability. In total, we survey 202 papers, most of which were published after 2017.