A Fundamental Performance Limitation for Adversarial Classification
This addresses a critical gap in ensuring reliable performance for machine learning systems used in high-stakes applications, providing foundational insights into inherent limitations.
The paper tackles the problem of provable guarantees for binary classification algorithms under adversarial manipulation, showing that optimizing accuracy inevitably increases sensitivity to such attacks, with the tradeoff curve depending solely on data statistics and not improvable by algorithm tuning.
Despite the widespread use of machine learning algorithms to solve problems of technological, economic, and social relevance, provable guarantees on the performance of these data-driven algorithms are critically lacking, especially when the data originates from unreliable sources and is transmitted over unprotected and easily accessible channels. In this paper we take an important step to bridge this gap and formally show that, in a quest to optimize their accuracy, binary classification algorithms -- including those based on machine-learning techniques -- inevitably become more sensitive to adversarial manipulation of the data. Further, for a given class of algorithms with the same complexity (i.e., number of classification boundaries), the fundamental tradeoff curve between accuracy and sensitivity depends solely on the statistics of the data, and cannot be improved by tuning the algorithm.