Don't Lie to Me: Avoiding Malicious Explanations with STEALTH
This addresses security and fairness issues for users of AI models, though it appears incremental as it builds on existing clustering and query techniques.
The paper tackles the problem of malicious attacks and unfairness in AI-generated models by introducing STEALTH, a method that uses recursive bi-clustering and limited queries to avoid detection and lying, achieving results with as few as 1 query per data cluster.
STEALTH is a method for using some AI-generated model, without suffering from malicious attacks (i.e. lying) or associated unfairness issues. After recursively bi-clustering the data, STEALTH system asks the AI model a limited number of queries about class labels. STEALTH asks so few queries (1 per data cluster) that malicious algorithms (a) cannot detect its operation, nor (b) know when to lie.