W. K. Chan

SE
h-index5
10papers
920citations
Novelty48%
AI Score42

10 Papers

LGAug 1, 2023
A Majority Invariant Approach to Patch Robustness Certification for Deep Learning Models

Qilin Zhou, Zhengyuan Wei, Haipeng Wang et al.

Patch robustness certification ensures no patch within a given bound on a sample can manipulate a deep learning model to predict a different label. However, existing techniques cannot certify samples that cannot meet their strict bars at the classifier or patch region levels. This paper proposes MajorCert. MajorCert firstly finds all possible label sets manipulatable by the same patch region on the same sample across the underlying classifiers, then enumerates their combinations element-wise, and finally checks whether the majority invariant of all these combinations is intact to certify samples.

SESep 5, 2023
A study on the impact of pre-trained model on Just-In-Time defect prediction

Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang et al.

Previous researchers conducting Just-In-Time (JIT) defect prediction tasks have primarily focused on the performance of individual pre-trained models, without exploring the relationship between different pre-trained models as backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We systematically explore the differences and connections between these models. Specifically, we investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution among these six models. Additionally, we conduct an ablation experiment to explore the sensitivity of each model to inputs. Furthermore, we investigate how the models perform in zero-shot and few-shot scenarios. Our findings indicate that each model based on different backbones shows improvements, and when the backbone's pre-training model is similar, the training resources that need to be consumed are much more closer. We also observe that Commit code plays a significant role in defect detection, and different pre-trained models demonstrate better defect detection ability with a balanced dataset under few-shot scenarios. These results provide new insights for optimizing JIT defect prediction tasks using pre-trained models and highlight the factors that require more attention when constructing such models. Additionally, CodeGPTJIT and GPT2JIT achieved better performance than DeepJIT and CC2Vec on the two datasets respectively under 2000 training samples. These findings emphasize the effectiveness of transformer-based pre-trained models in JIT defect prediction tasks, especially in scenarios with limited training data.

SEJul 19, 2024
A3Rank: Augmentation Alignment Analysis for Prioritizing Overconfident Failing Samples for Deep Learning Models

Zhengyuan Wei, Haipeng Wang, Qilin Zhou et al.

Sharpening deep learning models by training them with examples close to the decision boundary is a well-known best practice. Nonetheless, these models are still error-prone in producing predictions. In practice, the inference of the deep learning models in many application systems is guarded by a rejector, such as a confidence-based rejector, to filter out samples with insufficient prediction confidence. Such confidence-based rejectors cannot effectively guard against failing samples with high confidence. Existing test case prioritization techniques effectively distinguish confusing samples from confident samples to identify failing samples among the confusing ones, yet prioritizing the failing ones high among many confident ones is challenging. In this paper, we propose $A^3$Rank, a novel test case prioritization technique with augmentation alignment analysis, to address this problem. $A^3$Rank generates augmented versions of each test case and assesses the extent of the prediction result for the test case misaligned with these of the augmented versions and vice versa. Our experiment shows that $A^3$Rank can effectively rank failing samples escaping from the checking of confidence-based rejectors, which significantly outperforms the peer techniques by 163.63\% in the detection ratio of top-ranked samples. We also provide a framework to construct a detector devoted to augmenting these rejectors to defend these failing samples, and our detector can achieve a significantly higher defense success rate.

SEMay 13, 2024
CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models

Qilin Zhou, Zhengyuan Wei, Haipeng Wang et al.

Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, existing certified detection defenders suffer from protecting labels subject to manipulation, and existing certified recovery defenders cannot systematically warn samples about their labels. A certified defense that simultaneously offers robust labels and systematic warning protection against patch attacks is desirable. This paper proposes a novel certified defense technique called CrossCert. CrossCert formulates a novel approach by cross-checking two certified recovery defenders to provide unwavering certification and detection certification. Unwavering certification ensures that a certified sample, when subjected to a patched perturbation, will always be returned with a benign label without triggering any warnings with a provable guarantee. To our knowledge, CrossCert is the first certified detection technique to offer this guarantee. Our experiments show that, with a slightly lower performance than ViP and comparable performance with PatchCensor in terms of detection certification, CrossCert certifies a significant proportion of samples with the guarantee of unwavering certification.

SEDec 5, 2025
Toward Patch Robustness Certification and Detection for Deep Learning Systems Beyond Consistent Samples

Qilin Zhou, Zhengyuan Wei, Haipeng Wang et al.

Patch robustness certification is an emerging kind of provable defense technique against adversarial patch attacks for deep learning systems. Certified detection ensures the detection of all patched harmful versions of certified samples, which mitigates the failures of empirical defense techniques that could (easily) be compromised. However, existing certified detection methods are ineffective in certifying samples that are misclassified or whose mutants are inconsistently pre icted to different labels. This paper proposes HiCert, a novel masking-based certified detection technique. By focusing on the problem of mutants predicted with a label different from the true label with our formal analysis, HiCert formulates a novel formal relation between harmful samples generated by identified loopholes and their benign counterparts. By checking the bound of the maximum confidence among these potentially harmful (i.e., inconsistent) mutants of each benign sample, HiCert ensures that each harmful sample either has the minimum confidence among mutants that are predicted the same as the harmful sample itself below this bound, or has at least one mutant predicted with a label different from the harmful sample itself, formulated after two novel insights. As such, HiCert systematically certifies those inconsistent samples and consistent samples to a large extent. To our knowledge, HiCert is the first work capable of providing such a comprehensive patch robustness certification for certified detection. Our experiments show the high effectiveness of HiCert with a new state-of the-art performance: It certifies significantly more benign samples, including those inconsistent and consistent, and achieves significantly higher accuracy on those samples without warnings and a significantly lower false silent ratio.

LGJul 31, 2025
Scalable and Precise Patch Robustness Certification for Deep Learning Models with Top-k Predictions

Qilin Zhou, Haipeng Wang, Zhengyuan Wei et al.

Patch robustness certification is an emerging verification approach for defending against adversarial patch attacks with provable guarantees for deep learning systems. Certified recovery techniques guarantee the prediction of the sole true label of a certified sample. However, existing techniques, if applicable to top-k predictions, commonly conduct pairwise comparisons on those votes between labels, failing to certify the sole true label within the top k prediction labels precisely due to the inflation on the number of votes controlled by the attacker (i.e., attack budget); yet enumerating all combinations of vote allocation suffers from the combinatorial explosion problem. We propose CostCert, a novel, scalable, and precise voting-based certified recovery defender. CostCert verifies the true label of a sample within the top k predictions without pairwise comparisons and combinatorial explosion through a novel design: whether the attack budget on the sample is infeasible to cover the smallest total additional votes on top of the votes uncontrollable by the attacker to exclude the true labels from the top k prediction labels. Experiments show that CostCert significantly outperforms the current state-of-the-art defender PatchGuard, such as retaining up to 57.3% in certified accuracy when the patch size is 96, whereas PatchGuard has already dropped to zero.

SEJul 30, 2020
WANA: Symbolic Execution of Wasm Bytecode for Cross-Platform Smart Contract Vulnerability Detection

Dong Wang, Bo Jiang, W. K. Chan

Many popular blockchain platforms are supporting smart contracts for building decentralized applications. However, the vulnerabilities within smart contracts have led to serious financial loss to their end users. For the EOSIO blockchain platform, effective vulnerability detectors are still limited. Furthermore, existing vulnerability detection tools can only support one blockchain platform. In this work, we present WANA, a cross-platform smart contract vulnerability detection tool based on the symbolic execution of WebAssembly bytecode. Furthermore, WANA proposes a set of test oracles to detect the vulnerabilities in EOSIO and Ethereum smart contracts based on WebAssembly bytecode analysis. Our experimental analysis shows that WANA can effectively detect vulnerabilities in both EOSIO and Ethereum smart contracts with high efficiency.

SEJul 29, 2020
EOSFuzzer: Fuzzing EOSIO Smart Contracts for Vulnerability Detection

Yuhe Huang, Bo Jiang, W. K. Chan

EOSIO is one typical public blockchain platform. It is scalable in terms of transaction speeds and has a growing ecosystem supporting smart contracts and decentralized applications. However, the vulnerabilities within the EOSIO smart contracts have led to serious attacks, which caused serious financial loss to its end users. In this work, we systematically analyzed three typical EOSIO smart contract vulnerabilities and their related attacks. Then we presented EOSFuzzer, a general black-box fuzzing framework to detect vulnerabilities within EOSIO smart contracts. In particular, EOSFuzzer proposed effective attacking scenarios and test oracles for EOSIO smart contract fuzzing. Our fuzzing experiment on 3963 EOSIO smart contracts shows that EOSFuzzer is both effective and efficient to detect EOSIO smart contract vulnerabilities with high accuracy.

SEJul 11, 2018
ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection

Bo Jiang, Ye Liu, W. K. Chan

Decentralized cryptocurrencies feature the use of blockchain to transfer values among peers on networks without central agency. Smart contracts are programs running on top of the blockchain consensus protocol to enable people make agreements while minimizing trusts. Millions of smart contracts have been deployed in various decentralized applications. The security vulnerabilities within those smart contracts pose significant threats to their applications. Indeed, many critical security vulnerabilities within smart contracts on Ethereum platform have caused huge financial losses to their users. In this work, we present ContractFuzzer, a novel fuzzer to test Ethereum smart contracts for security vulnerabilities. ContractFuzzer generates fuzzing inputs based on the ABI specifications of smart contracts, defines test oracles to detect security vulnerabilities, instruments the EVM to log smart contracts runtime behaviors, and analyzes these logs to report security vulnerabilities. Our fuzzing of 6991 smart contracts has flagged more than 459 vulnerabilities with high precision. In particular, our fuzzing tool successfully detects the vulnerability of the DAO contract that leads to USD 60 million loss and the vulnerabilities of Parity Wallet that have led to the loss of $30 million and the freezing of USD 150 million worth of Ether.

SESep 23, 2015
To What Extent Is Stress Testing of Android TV Applications Automated in Industrial Environments?

Bo Jiang, Peng Chen, W. K. Chan et al.

An Android-based smart Television (TV) must reliably run its applications in an embedded program environment under diverse hardware resource conditions. Owing to the diverse hardware components used to build numerous TV models, TV simulators are usually not high enough in fidelity to simulate various TV models, and thus are only regarded as unreliable alternatives when stress testing such applications. Therefore, even though stress testing on real TV sets is tedious, it is the de facto approach to ensure the reliability of these applications in the industry. In this paper, we study to what extent stress testing of smart TV applications can be fully automated in the industrial environments. To the best of our knowledge, no previous work has addressed this important question. We summarize the find-ings collected from 10 industrial test engineers to have tested 20 such TV applications in a real production environment. Our study shows that the industry required test automation supports on high-level GUI object controls and status checking, setup of resource conditions and the interplay between the two. With such supports, 87% of the industrial test specifications of one TV model can be fully automated and 71.4% of them were found to be fully reusable to test a subsequent TV model with major up-grades of hardware, operating system and application. It repre-sents a significant improvement with margins of 28% and 38%, respectively, compared to stress testing without such supports.