LG CROct 18, 2022

Towards Fair Classification against Poisoning Attacks

Han Xu, Xiaorui Liu, Yuxuan Wan, Jiliang Tang

arXiv:2210.09503v13.34 citationsh-index: 18

Originality Incremental advance

AI Analysis

This work addresses the problem of securing fair classification systems against adversarial poisoning for users relying on equitable AI decisions, representing an incremental improvement by adapting existing defenses to this specific context.

The paper tackles the vulnerability of fair classification models to poisoning attacks, where attackers insert malicious training samples to degrade both accuracy and fairness. The authors propose a defense framework that improves robustness, achieving better accuracy and fairness compared to baseline methods in experiments.

Fair classification aims to stress the classification models to achieve the equality (treatment or prediction quality) among different sensitive groups. However, fair classification can be under the risk of poisoning attacks that deliberately insert malicious training samples to manipulate the trained classifiers' performance. In this work, we study the poisoning scenario where the attacker can insert a small fraction of samples into training data, with arbitrary sensitive attributes as well as other predictive features. We demonstrate that the fairly trained classifiers can be greatly vulnerable to such poisoning attacks, with much worse accuracy & fairness trade-off, even when we apply some of the most effective defenses (originally proposed to defend traditional classification tasks). As countermeasures to defend fair classification tasks, we propose a general and theoretically guaranteed framework which accommodates traditional defense methods to fair classification against poisoning attacks. Through extensive experiments, the results validate that the proposed defense framework obtains better robustness in terms of accuracy and fairness than representative baseline methods.

View on arXiv PDF

Similar