On existence, uniqueness and scalability of adversarial robustness measures for AI classifiers
This work addresses the need for robust and interpretable adversarial measures in AI safety, particularly for healthcare applications, though it appears incremental as it builds on existing concepts like adversarial robustness.
The paper tackled the problem of verifying and computing minimal adversarial paths and distances for AI classifiers, establishing mathematical conditions for their existence, uniqueness, and explicit computation across various models. It demonstrated practical applications on synthetic benchmarks and biomedical data, showing how these measures enable unique, minimal risk-mitigating interventions in patient-specific scenarios.
Simply-verifiable mathematical conditions for existence, uniqueness and explicit analytical computation of minimal adversarial paths (MAP) and minimal adversarial distances (MAD) for (locally) uniquely-invertible classifiers, for generalized linear models (GLM), and for entropic AI (EAI) are formulated and proven. Practical computation of MAP and MAD, their comparison and interpretations for various classes of AI tools (for neuronal networks, boosted random forests, GLM and EAI) are demonstrated on the common synthetic benchmarks: on a double Swiss roll spiral and its extensions, as well as on the two biomedical data problems (for the health insurance claim predictions, and for the heart attack lethality classification). On biomedical applications it is demonstrated how MAP provides unique minimal patient-specific risk-mitigating interventions in the predefined subsets of accessible control variables.