Victoria Cox

SE
3papers
72citations
Novelty52%
AI Score25

3 Papers

SEMay 17, 2022
Hierarchical Distribution-Aware Testing of Deep Learning

Wei Huang, Xingyu Zhao, Alec Banks et al.

Deep Learning (DL) is increasingly used in safety-critical applications, raising concerns about its reliability. DL suffers from a well-known problem of lacking robustness, especially when faced with adversarial perturbations known as Adversarial Examples (AEs). Despite recent efforts to detect AEs using advanced attack and testing methods, these approaches often overlook the input distribution and perceptual quality of the perturbations. As a result, the detected AEs may not be relevant in practical applications or may appear unrealistic to human observers. This can waste testing resources on rare AEs that seldom occur during real-world use, limiting improvements in DL model dependability. In this paper, we propose a new robustness testing approach for detecting AEs that considers both the feature level distribution and the pixel level distribution, capturing the perceptual quality of adversarial perturbations. The two considerations are encoded by a novel hierarchical mechanism. First, we select test seeds based on the density of feature level distribution and the vulnerability of adversarial robustness. The vulnerability of test seeds are indicated by the auxiliary information, that are highly correlated with local robustness. Given a test seed, we then develop a novel genetic algorithm based local test case generation method, in which two fitness functions work alternatively to control the perceptual quality of detected AEs. Finally, extensive experiments confirm that our holistic approach considering hierarchical distributions is superior to the state-of-the-arts that either disregard any input distribution or only consider a single (non-hierarchical) distribution, in terms of not only detecting imperceptible AEs but also improving the overall robustness of the DL model under testing.

SENov 30, 2021
Reliability Assessment and Safety Arguments for Machine Learning Components in System Assurance

Yi Dong, Wei Huang, Vibhav Bharti et al.

The increasing use of Machine Learning (ML) components embedded in autonomous systems -- so-called Learning-Enabled Systems (LESs) -- has resulted in the pressing need to assure their functional safety. As for traditional functional safety, the emerging consensus within both, industry and academia, is to use assurance cases for this purpose. Typically assurance cases support claims of reliability in support of safety, and can be viewed as a structured way of organising arguments and evidence generated from safety analysis and reliability modelling activities. While such assurance activities are traditionally guided by consensus-based standards developed from vast engineering experience, LESs pose new challenges in safety-critical application due to the characteristics and design of ML models. In this article, we first present an overall assurance framework for LESs with an emphasis on quantitative aspects, e.g., breaking down system-level safety targets to component-level requirements and supporting claims stated in reliability metrics. We then introduce a novel model-agnostic Reliability Assessment Model (RAM) for ML classifiers that utilises the operational profile and robustness verification evidence. We discuss the model assumptions and the inherent challenges of assessing ML reliability uncovered by our RAM and propose solutions to practical use. Probabilistic safety argument templates at the lower ML component-level are also developed based on the RAM. Finally, to evaluate and demonstrate our methods, we not only conduct experiments on synthetic/benchmark datasets but also scope our methods with case studies on simulated Autonomous Underwater Vehicles and physical Unmanned Ground Vehicles.

LGJun 2, 2021
Assessing the Reliability of Deep Learning Classifiers Through Robustness Evaluation and Operational Profiles

Xingyu Zhao, Wei Huang, Alec Banks et al.

The utilisation of Deep Learning (DL) is advancing into increasingly more sophisticated applications. While it shows great potential to provide transformational capabilities, DL also raises new challenges regarding its reliability in critical functions. In this paper, we present a model-agnostic reliability assessment method for DL classifiers, based on evidence from robustness evaluation and the operational profile (OP) of a given application. We partition the input space into small cells and then "assemble" their robustness (to the ground truth) according to the OP, where estimators on the cells' robustness and OPs are provided. Reliability estimates in terms of the probability of misclassification per input (pmi) can be derived together with confidence levels. A prototype tool is demonstrated with simplified case studies. Model assumptions and extension to real-world applications are also discussed. While our model easily uncovers the inherent difficulties of assessing the DL dependability (e.g. lack of data with ground truth and scalability issues), we provide preliminary/compromised solutions to advance in this research direction.