ExpProof : Operationalizing Explanations for Confidential Models with ZKPs
This work addresses the challenge of ensuring reliable explanations in high-stakes, adversarial environments where parties may manipulate them, which is crucial for regulatory compliance and trust in AI systems.
The paper tackles the problem of making explanations for machine learning models trustworthy in adversarial settings by using Zero-Knowledge Proofs (ZKPs), specifically adapting LIME for ZKP compatibility and testing it on Neural Networks and Random Forests.
In principle, explanations are intended as a way to increase trust in machine learning models and are often obligated by regulations. However, many circumstances where these are demanded are adversarial in nature, meaning the involved parties have misaligned interests and are incentivized to manipulate explanations for their purpose. As a result, explainability methods fail to be operational in such settings despite the demand \cite{bordt2022post}. In this paper, we take a step towards operationalizing explanations in adversarial scenarios with Zero-Knowledge Proofs (ZKPs), a cryptographic primitive. Specifically we explore ZKP-amenable versions of the popular explainability algorithm LIME and evaluate their performance on Neural Networks and Random Forests. Our code is publicly available at https://github.com/emlaufer/ExpProof.