LGMar 21, 2025

Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

arXiv:2503.17172v11 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses a critical fairness issue in AI safety by improving worst-class robustness for smoothed classifiers, though it is incremental as it builds on prior adversarial robustness research.

The paper tackles the problem of robust fairness in deep neural networks by addressing disparities in worst-class certified robustness for smoothed classifiers, introducing a regularization method that reduces worst-class error by up to 15% on CIFAR-10 and improves certified robustness across multiple datasets.

Recent studies have identified a critical challenge in deep neural networks (DNNs) known as ``robust fairness", where models exhibit significant disparities in robust accuracy across different classes. While prior work has attempted to address this issue in adversarial robustness, the study of worst-class certified robustness for smoothed classifiers remains unexplored. Our work bridges this gap by developing a PAC-Bayesian bound for the worst-class error of smoothed classifiers. Through theoretical analysis, we demonstrate that the largest eigenvalue of the smoothed confusion matrix fundamentally influences the worst-class error of smoothed classifiers. Based on this insight, we introduce a regularization method that optimizes the largest eigenvalue of smoothed confusion matrix to enhance worst-class accuracy of the smoothed classifier and further improve its worst-class certified robustness. We provide extensive experimental validation across multiple datasets and model architectures to demonstrate the effectiveness of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes