Learning Robust and Privacy-Preserving Representations via Information Theory
This addresses security and privacy vulnerabilities in machine learning models, which is a critical issue for deploying AI in sensitive domains, but appears incremental as it builds on existing representation learning and information theory concepts.
The paper tackles the problem of machine learning models being vulnerable to security and privacy attacks by proposing an information-theoretic framework to learn representations that are robust to adversarial examples and attribute inference, while maintaining task utility, with derived theoretical results on trade-offs and guaranteed privacy leakage.
Machine learning models are vulnerable to both security attacks (e.g., adversarial examples) and privacy attacks (e.g., private attribute inference). We take the first step to mitigate both the security and privacy attacks, and maintain task utility as well. Particularly, we propose an information-theoretic framework to achieve the goals through the lens of representation learning, i.e., learning representations that are robust to both adversarial examples and attribute inference adversaries. We also derive novel theoretical results under our framework, e.g., the inherent trade-off between adversarial robustness/utility and attribute privacy, and guaranteed attribute privacy leakage against attribute inference adversaries.