Enhancing Robust Fairness via Confusional Spectral Regularization
This work addresses robust fairness issues in deep learning, which is crucial for reliable AI systems, but it is incremental as it builds on existing regularization methods.
The paper tackles the problem of robust fairness in deep neural networks, where robust accuracy varies across classes, by proposing a novel regularization technique that targets the spectral norm of the robust confusion matrix, resulting in improved worst-class robust accuracy as validated through experiments on various datasets and models.
Recent research has highlighted a critical issue known as ``robust fairness", where robust accuracy varies significantly across different classes, undermining the reliability of deep neural networks (DNNs). A common approach to address this has been to dynamically reweight classes during training, giving more weight to those with lower empirical robust performance. However, we find there is a divergence of class-wise robust performance between training set and testing set, which limits the effectiveness of these explicit reweighting methods, indicating the need for a principled alternative. In this work, we derive a robust generalization bound for the worst-class robust error within the PAC-Bayesian framework, accounting for unknown data distributions. Our analysis shows that the worst-class robust error is influenced by two main factors: the spectral norm of the empirical robust confusion matrix and the information embedded in the model and training set. While the latter has been extensively studied, we propose a novel regularization technique targeting the spectral norm of the robust confusion matrix to improve worst-class robust accuracy and enhance robust fairness. We validate our approach through comprehensive experiments on various datasets and models, demonstrating its effectiveness in enhancing robust fairness.