A Majority Invariant Approach to Patch Robustness Certification for Deep Learning Models
This work addresses patch robustness certification for deep learning models, offering a solution for scenarios where current techniques are insufficient, though it appears incremental as it builds on existing certification frameworks.
The paper tackles the problem of certifying deep learning models against adversarial patches by proposing MajorCert, which certifies samples by checking the majority invariant of label set combinations across classifiers, addressing limitations of existing methods that fail on samples not meeting strict criteria.
Patch robustness certification ensures no patch within a given bound on a sample can manipulate a deep learning model to predict a different label. However, existing techniques cannot certify samples that cannot meet their strict bars at the classifier or patch region levels. This paper proposes MajorCert. MajorCert firstly finds all possible label sets manipulatable by the same patch region on the same sample across the underlying classifiers, then enumerates their combinations element-wise, and finally checks whether the majority invariant of all these combinations is intact to certify samples.