Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples
This addresses security vulnerabilities in few-shot learning systems, which are crucial for applications with scarce labeled data, but the approach is incremental as it builds on existing detection concepts.
The authors tackled the problem of adversarial attacks on few-shot classifiers by proposing a simple attack-agnostic detection method based on self-similarity and filtering, achieving good detection performance on miniImagenet and CUB datasets across different classifiers and attack strengths, beating baselines.
Few-shot classifiers have been shown to exhibit promising results in use cases where user-provided labels are scarce. These models are able to learn to predict novel classes simply by training on a non-overlapping set of classes. This can be largely attributed to the differences in their mechanisms as compared to conventional deep networks. However, this also offers new opportunities for novel attackers to induce integrity attacks against such models, which are not present in other machine learning setups. In this work, we aim to close this gap by studying a conceptually simple approach to defend few-shot classifiers against adversarial attacks. More specifically, we propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering, to flag out adversarial support sets which destroy the understanding of a victim classifier for a certain class. Our extended evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance, across three different few-shot classifiers and across different attack strengths, beating baselines. Our observed results allow our approach to establishing itself as a strong detection method for support set poisoning attacks. We also show that our approach constitutes a generalizable concept, as it can be paired with other filtering functions. Finally, we provide an analysis of our results when we vary two components found in our detection approach.