Yanghao Zhang

CV
h-index27
11papers
390citations
Novelty43%
AI Score46

11 Papers

LGMar 3
IoUCert: Robustness Verification for Anchor-based Object Detectors

Benedikt Brückner, Alejandro J. Mercado, Yanghao Zhang et al.

While formal robustness verification has seen significant success in image classification, scaling these guarantees to object detection remains notoriously difficult due to complex non-linear coordinate transformations and Intersection-over-Union (IoU) metrics. We introduce IoUCert, a novel formal verification framework designed specifically to overcome these bottlenecks in foundational anchor-based object detection architectures. Focusing on the object localisation component in single-object settings, we propose a coordinate transformation that enables our algorithm to circumvent precision-degrading relaxations of non-linear box prediction functions. This allows us to optimise bounds directly with respect to the anchor box offsets which enables a novel Interval Bound Propagation method that derives optimal IoU bounds. We demonstrate that our method enables, for the first time, the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.

CRJun 3, 2024Code
Safeguarding Large Language Models: A Survey

Yi Dong, Ronghui Mu, Yanghao Zhang et al.

In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced into a comprehensive mechanism dealing with ethical issues in various contexts. First, the paper elucidates the current landscape of safeguarding mechanisms that major LLM service providers and the open-source community employ. This is followed by the techniques to evaluate, analyze, and enhance some (un)desirable properties that a guardrail might want to enforce, such as hallucinations, fairness, privacy, and so on. Based on them, we review techniques to circumvent these controls (i.e., attacks), to defend the attacks, and to reinforce the guardrails. While the techniques mentioned above represent the current status and the active research trends, we also discuss several challenges that cannot be easily dealt with by the methods and present our vision on how to implement a comprehensive guardrail through the full consideration of multi-disciplinary approach, neural-symbolic method, and systems development lifecycle.

SEAug 3, 2021Code
Tutorials on Testing Neural Networks

Nicolas Berthier, Youcheng Sun, Wei Huang et al.

Deep learning achieves remarkable performance on pattern recognition, but can be vulnerable to defects of some important properties such as robustness and security. This tutorial is based on a stream of research conducted since the summer of 2018 at a few UK universities, including the University of Liverpool, University of Oxford, Queen's University Belfast, University of Lancaster, University of Loughborough, and University of Exeter. The research aims to adapt software engineering methods, in particular software testing methods, to work with machine learning models. Software testing techniques have been successful in identifying software bugs, and helping software developers in validating the software they design and implement. It is for this reason that a few software testing techniques -- such as the MC/DC coverage metric -- have been mandated in industrial standards for safety critical systems, including the ISO26262 for automotive systems and the RTCA DO-178B/C for avionics systems. However, these techniques cannot be directly applied to machine learning models, because the latter are drastically different from traditional software, and their design follows a completely different development life-cycle. As the outcome of this thread of research, the team has developed a series of methods that adapt the software testing techniques to work with a few classes of machine learning models. The latter notably include convolutional neural networks, recurrent neural networks, and random forest. The tools developed from this research are now collected, and publicly released, in a GitHub repository: \url{https://github.com/TrustAI/DeepConcolic}, with the BSD 3-Clause licence. This tutorial is to go through the major functionalities of the tools with a few running examples, to exhibit how the developed techniques work, what the results are, and how to interpret them.

CVJan 4, 2021Code
Fooling Object Detectors: Adversarial Attacks by Half-Neighbor Masks

Yanghao Zhang, Fu Wang, Wenjie Ruan

Although there are a great number of adversarial attacks on deep learning based classifiers, how to attack object detection systems has been rarely studied. In this paper, we propose a Half-Neighbor Masked Projected Gradient Descent (HNM-PGD) based attack, which can generate strong perturbation to fool different kinds of detectors under strict constraints. We also applied the proposed HNM-PGD attack in the CIKM 2020 AnalytiCup Competition, which was ranked within the top 1% on the leaderboard. We release the code at https://github.com/YanghaoZYH/HNM-PGD.

CVOct 15, 2020Code
Generalizing Universal Adversarial Attacks Beyond Additive Perturbations

Yanghao Zhang, Wenjie Ruan, Fu Wang et al.

The previous study has shown that universal adversarial attacks can fool deep neural networks over a large set of input images with a single human-invisible perturbation. However, current methods for universal adversarial attacks are based on additive perturbation, which cause misclassification when the perturbation is directly added to the input images. In this paper, for the first time, we show that a universal adversarial attack can also be achieved via non-additive perturbation (e.g., spatial transformation). More importantly, to unify both additive and non-additive perturbations, we propose a novel unified yet flexible framework for universal adversarial attacks, called GUAP, which is able to initiate attacks by additive perturbation, non-additive perturbation, or the combination of both. Extensive experiments are conducted on CIFAR-10 and ImageNet datasets with six deep neural network models including GoogleLeNet, VGG16/19, ResNet101/152, and DenseNet121. The empirical experiments demonstrate that GUAP can obtain up to 90.9% and 99.24% successful attack rates on CIFAR-10 and ImageNet datasets, leading to over 15% and 19% improvements respectively than current state-of-the-art universal adversarial attacks. The code for reproducing the experiments in this paper is available at https://github.com/TrustAI/GUAP.

CVFeb 27, 2024
Towards Fairness-Aware Adversarial Learning

Yanghao Zhang, Tianle Zhang, Ronghui Mu et al.

Although adversarial training (AT) has proven effective in enhancing the model's robustness, the recently revealed issue of fairness in robustness has not been well addressed, i.e. the robust accuracy varies significantly among different categories. In this paper, instead of uniformly evaluating the model's average class performance, we delve into the issue of robust fairness, by considering the worst-case distribution across various classes. We propose a novel learning paradigm, named Fairness-Aware Adversarial Learning (FAAL). As a generalization of conventional AT, we re-define the problem of adversarial training as a min-max-max framework, to ensure both robustness and fairness of the trained model. Specifically, by taking advantage of distributional robust optimization, our method aims to find the worst distribution among different categories, and the solution is guaranteed to obtain the upper bound performance with high probability. In particular, FAAL can fine-tune an unfair robust model to be fair within only two epochs, without compromising the overall clean and robust accuracies. Extensive experiments on various image datasets validate the superior performance and efficiency of the proposed FAAL compared to other state-of-the-art methods.

LGDec 11, 2023
Reward Certification for Policy Smoothed Reinforcement Learning

Ronghui Mu, Leandro Soriano Marcolino, Tianle Zhang et al.

Reinforcement Learning (RL) has achieved remarkable success in safety-critical areas, but it can be weakened by adversarial attacks. Recent studies have introduced "smoothed policies" in order to enhance its robustness. Yet, it is still challenging to establish a provable guarantee to certify the bound of its total reward. Prior methods relied primarily on computing bounds using Lipschitz continuity or calculating the probability of cumulative reward above specific thresholds. However, these techniques are only suited for continuous perturbations on the RL agent's observations and are restricted to perturbations bounded by the $l_2$-norm. To address these limitations, this paper proposes a general black-box certification method capable of directly certifying the cumulative reward of the smoothed policy under various $l_p$-norm bounded perturbations. Furthermore, we extend our methodology to certify perturbations on action spaces. Our approach leverages f-divergence to measure the distinction between the original distribution and the perturbed distribution, subsequently determining the certification bound by solving a convex optimisation problem. We provide a comprehensive theoretical analysis and run sufficient experiments in multiple environments. Our results show that our method not only improves the certified lower bound of mean cumulative reward but also demonstrates better efficiency than state-of-the-art techniques.

CVDec 18, 2024
A Black-Box Evaluation Framework for Semantic Robustness in Bird's Eye View Detection

Fu Wang, Yanghao Zhang, Xiangyu Yin et al.

Camera-based Bird's Eye View (BEV) perception models receive increasing attention for their crucial role in autonomous driving, a domain where concerns about the robustness and reliability of deep learning have been raised. While only a few works have investigated the effects of randomly generated semantic perturbations, aka natural corruptions, on the multi-view BEV detection task, we develop a black-box robustness evaluation framework that adversarially optimises three common semantic perturbations: geometric transformation, colour shifting, and motion blur, to deceive BEV models, serving as the first approach in this emerging field. To address the challenge posed by optimising the semantic perturbation, we design a smoothed, distance-based surrogate function to replace the mAP metric and introduce SimpleDIRECT, a deterministic optimisation algorithm that utilises observed slopes to guide the optimisation process. By comparing with randomised perturbation and two optimisation baselines, we demonstrate the effectiveness of the proposed framework. Additionally, we provide a benchmark on the semantic robustness of ten recent BEV models. The results reveal that PolarFormer, which emphasises geometric information from multi-view images, exhibits the highest robustness, whereas BEVDet is fully compromised, with its precision reduced to zero.

AIMay 19, 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Xiaowei Huang, Wenjie Ruan, Wei Huang et al.

Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements.

LGMar 4, 2021
Dynamic Efficient Adversarial Training Guided by Gradient Magnitude

Fu Wang, Yanghao Zhang, Yanbin Zheng et al.

Adversarial training is an effective but time-consuming way to train robust deep neural networks that can withstand strong adversarial attacks. As a response to its inefficiency, we propose Dynamic Efficient Adversarial Training (DEAT), which gradually increases the adversarial iteration during training. We demonstrate that the gradient's magnitude correlates with the curvature of the trained model's loss landscape, allowing it to reflect the effect of adversarial training. Therefore, based on the magnitude of the gradient, we propose a general acceleration strategy, M+ acceleration, which enables an automatic and highly effective method of adjusting the training procedure. M+ acceleration is computationally efficient and easy to implement. It is suited for DEAT and compatible with the majority of existing adversarial training techniques. Extensive experiments have been done on CIFAR-10 and ImageNet datasets with various training environments. The results show that the proposed M+ acceleration significantly improves the training efficiency of existing adversarial training methods while achieving similar robustness performance. This demonstrates that the strategy is highly adaptive and offers a valuable solution for automatic adversarial training.

CVFeb 21, 2018
Collaboratively Weighting Deep and Classic Representation via L2 Regularization for Image Classification

Shaoning Zeng, Bob Zhang, Yanghao Zhang et al.

Deep convolutional neural networks provide a powerful feature learning capability for image classification. The deep image features can be utilized to deal with many image understanding tasks like image classification and object recognition. However, the robustness obtained in one dataset can be hardly reproduced in the other domain, which leads to inefficient models far from state-of-the-art. We propose a deep collaborative weight-based classification (DeepCWC) method to resolve this problem, by providing a novel option to fully take advantage of deep features in classic machine learning. It firstly performs the L2-norm based collaborative representation on the original images, as well as the deep features extracted by deep CNN models. Then, two distance vectors, obtained based on the pair of linear representations, are fused together via a novel collaborative weight. This collaborative weight enables deep and classic representations to weigh each other. We observed the complementarity between two representations in a series of experiments on 10 facial and object datasets. The proposed DeepCWC produces very promising classification results, and outperforms many other benchmark methods, especially the ones claimed for Fashion-MNIST. The code is going to be published in our public repository.