LGJan 26

Explainability Methods for Hardware Trojan Detection: A Systematic Comparison

Paul Whitten, Francis Wolff, Chris Papachristou

arXiv:2601.18696v1h-index: 2

Originality Synthesis-oriented

AI Analysis

It addresses the need for interpretable explanations in hardware security for engineers, though it is incremental as it compares existing methods rather than introducing a new paradigm.

This work systematically compares three explainability methods for hardware trojan detection on the Trust-Hub benchmark, finding that property-based and case-based approaches offer domain-aligned interpretability, while achieving a 9-fold precision improvement (46.15% vs. 5.13%) over prior work.

Hardware trojan detection requires accurate identification and interpretable explanations for security engineers to validate and act on results. This work compares three explainability categories for gate-level trojan detection on the Trust-Hub benchmark: (1) domain-aware property-based analysis of 31 circuit-specific features from gate fanin patterns, flip-flop distances, and I/O connectivity; (2) case-based reasoning using k-nearest neighbors for precedent-based explanations; and (3) model-agnostic feature attribution (LIME, SHAP, gradient). Results show different advantages per approach. Property-based analysis provides explanations through circuit concepts like "high fanin complexity near outputs indicates potential triggers." Case-based reasoning achieves 97.4% correspondence between predictions and training exemplars, offering justifications grounded in precedent. LIME and SHAP provide feature attributions with strong inter-method correlation (r=0.94, p<0.001) but lack circuit-level context for validation. XGBoost classification achieves 46.15% precision and 52.17% recall on 11,392 test samples, a 9-fold precision improvement over prior work (Hasegawa et al.: 5.13%) while reducing false positive rates from 5.6% to 0.25%. Gradient-based attribution runs 481 times faster than SHAP but provides similar domain-opaque insights. This work demonstrates that property-based and case-based approaches offer domain alignment and precedent-based interpretability compared to generic feature rankings, with implications for XAI deployment where practitioners must validate ML predictions.

View on arXiv PDF

Similar