CRAIJan 7, 2025

TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning

arXiv:2501.04108v2h-index: 9AAAI
Originality Incremental advance
AI Analysis

This addresses security vulnerabilities in self-supervised learning for downstream tasks, but it is incremental as it builds on existing trojan attack and defense research.

The paper tackles the problem of detecting and removing trojan triggers from inputs in self-supervised learning encoders, proposing TrojanDec as a data-free method that identifies trojaned inputs and recovers them, with experiments showing it outperforms state-of-the-art defenses.

An image encoder pre-trained by self-supervised learning can be used as a general-purpose feature extractor to build downstream classifiers for various downstream tasks. However, many studies showed that an attacker can embed a trojan into an encoder such that multiple downstream classifiers built based on the trojaned encoder simultaneously inherit the trojan behavior. In this work, we propose TrojanDec, the first data-free method to identify and recover a test input embedded with a trigger. Given a (trojaned or clean) encoder and a test input, TrojanDec first predicts whether the test input is trojaned. If not, the test input is processed in a normal way to maintain the utility. Otherwise, the test input will be further restored to remove the trigger. Our extensive evaluation shows that TrojanDec can effectively identify the trojan (if any) from a given test input and recover it under state-of-the-art trojan attacks. We further demonstrate by experiments that our TrojanDec outperforms the state-of-the-art defenses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes