Changho Seo

9.2LGSep 23, 2024Code

SDBA: A Stealthy and Long-Lasting Durable Backdoor Attack in Federated Learning

Minyeong Choe, Cheolhee Park, Changho Seo et al.

Federated learning is a promising approach for training machine learning models while preserving data privacy. However, its distributed nature makes it vulnerable to backdoor attacks, particularly in NLP tasks, where related research remains limited. This paper introduces SDBA, a novel backdoor attack mechanism designed for NLP tasks in federated learning environments. Through a systematic analysis across LSTM and GPT-2 models, we identify the most vulnerable layers for backdoor injection and achieve both stealth and long-lasting durability by applying layer-wise gradient masking and top-k% gradient masking. Also, to evaluate the task generalizability of SDBA, we additionally conduct experiments on the T5 model. Experiments on next-token prediction, sentiment analysis, and question answering tasks show that SDBA outperforms existing backdoors in terms of durability and effectively bypasses representative defense mechanisms, demonstrating notable performance in transformer-based models such as GPT-2. These results highlight the urgent need for robust defense strategies in NLP-based federated learning systems.

13.0CLSep 10, 2025

Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms

Minyeong Choe, Haehyun Cho, Changho Seo et al.

Understanding how Transformer-based language models store and retrieve factual associations is critical for improving interpretability and enabling targeted model editing. Prior work, primarily on GPT-style models, has identified MLP modules in early layers as key contributors to factual recall. However, it remains unclear whether these findings generalize across different autoregressive architectures. To address this, we conduct a comprehensive evaluation of factual recall across several models -- including GPT, LLaMA, Qwen, and DeepSeek -- analyzing where and how factual information is encoded and accessed. Consequently, we find that Qwen-based models behave differently from previous patterns: attention modules in the earliest layers contribute more to factual recall than MLP modules. Our findings suggest that even within the autoregressive Transformer family, architectural variations can lead to fundamentally different mechanisms of factual recall.

Changho Seo

2 Papers