Influence-Guided Concolic Testing of Transformer Robustness
This work addresses the challenge of testing Transformer robustness for debugging and model auditing, though it appears incremental as it builds on existing concolic testing methods with specific enhancements for Transformers.
The researchers tackled the problem of efficiently finding inputs that cause Transformer classifiers to make incorrect decisions by developing an influence-guided concolic tester that uses SHAP-based estimates to prioritize search paths, resulting in more efficient discovery of label-flip inputs compared to a FIFO baseline under small L0 budgets.
Concolic testing for deep neural networks alternates concrete execution with constraint solving to search for inputs that flip decisions. We present an {influence-guided} concolic tester for Transformer classifiers that ranks path predicates by SHAP-based estimates of their impact on the model output. To enable SMT solving on modern architectures, we prototype a solver-compatible, pure-Python semantics for multi-head self-attention and introduce practical scheduling heuristics that temper constraint growth on deeper models. In a white-box study on compact Transformers under small $L_0$ budgets, influence guidance finds label-flip inputs more efficiently than a FIFO baseline and maintains steady progress on deeper networks. Aggregating successful attack instances with a SHAP-based critical decision path analysis reveals recurring, compact decision logic shared across attacks. These observations suggest that (i) influence signals provide a useful search bias for symbolic exploration, and (ii) solver-friendly attention semantics paired with lightweight scheduling make concolic testing feasible for contemporary Transformer models, offering potential utility for debugging and model auditing.