SEAIJul 18, 2023

CertPri: Certifiable Prioritization for Deep Neural Networks via Movement Cost in Feature Space

arXiv:2307.09375v19 citationsh-index: 29
Originality Highly original
AI Analysis

This addresses the need for certifiable and effective prioritization methods to enhance DNN quality in software systems, representing a novel method for a known bottleneck.

The paper tackles the problem of test input prioritization for deep neural networks to identify bug-revealing inputs earlier, proposing CertPri which improves prioritization effectiveness by 53.97% on average compared to baselines.

Deep neural networks (DNNs) have demonstrated their outperformance in various software systems, but also exhibit misbehavior and even result in irreversible disasters. Therefore, it is crucial to identify the misbehavior of DNN-based software and improve DNNs' quality. Test input prioritization is one of the most appealing ways to guarantee DNNs' quality, which prioritizes test inputs so that more bug-revealing inputs can be identified earlier with limited time and manual labeling efforts. However, the existing prioritization methods are still limited from three aspects: certifiability, effectiveness, and generalizability. To overcome the challenges, we propose CertPri, a test input prioritization technique designed based on a movement cost perspective of test inputs in DNNs' feature space. CertPri differs from previous works in three key aspects: (1) certifiable: it provides a formal robustness guarantee for the movement cost; (2) effective: it leverages formally guaranteed movement costs to identify malicious bug-revealing inputs; and (3) generic: it can be applied to various tasks, data, models, and scenarios. Extensive evaluations across 2 tasks (i.e., classification and regression), 6 data forms, 4 model structures, and 2 scenarios (i.e., white-box and black-box) demonstrate CertPri's superior performance. For instance, it significantly improves 53.97% prioritization effectiveness on average compared with baselines. Its robustness and generalizability are 1.41~2.00 times and 1.33~3.39 times that of baselines on average, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes