CV CRFeb 28, 2024

Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks

Alexander Unnervik, Hatef Otroshi Shahreza, Anjith George, Sébastien Marcel

arXiv:2402.18718v23.72 citationsh-index: 12

Originality Incremental advance

AI Analysis

This addresses the need for backdoor detection in biometric and open-set scenarios, which is an incremental advancement as it builds on existing detection methods but applies them to a less-studied domain.

The paper tackles the problem of detecting backdoor attacks in open-set classification tasks by proposing a method using model pairs and embedding translation to compute similarity scores, showing that this score can indicate backdoor presence across different architectures and datasets, with detection possible even when both models are backdoored.

Backdoor attacks allow an attacker to embed a specific vulnerability in a machine learning algorithm, activated when an attacker-chosen pattern is presented, causing a specific misprediction. The need to identify backdoors in biometric scenarios has led us to propose a novel technique with different trade-offs. In this paper we propose to use model pairs on open-set classification tasks for detecting backdoors. Using a simple linear operation to project embeddings from a probe model's embedding space to a reference model's embedding space, we can compare both embeddings and compute a similarity score. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures, having been trained independently and on different datasets. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature. Additionally, we show that backdoors can be detected even when both models are backdoored. The source code is made available for reproducibility purposes.

View on arXiv PDF

Similar