SoK: Unintended Interactions among Machine Learning Defenses and Risks
This work addresses a critical gap in understanding how defenses in machine learning can inadvertently affect multiple risks, which is important for researchers and practitioners developing robust ML systems.
The paper tackles the problem of unintended interactions among machine learning defenses for security, privacy, and fairness, presenting a framework based on overfitting and memorization to explain these interactions and empirically validating two new conjectures.
Machine learning (ML) models cannot neglect risks to security, privacy, and fairness. Several defenses have been proposed to mitigate such risks. When a defense is effective in mitigating one risk, it may correspond to increased or decreased susceptibility to other risks. Existing research lacks an effective framework to recognize and explain these unintended interactions. We present such a framework, based on the conjecture that overfitting and memorization underlie unintended interactions. We survey existing literature on unintended interactions, accommodating them within our framework. We use our framework to conjecture on two previously unexplored interactions, and empirically validate our conjectures.