SE LGMay 5, 2024

Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy

Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Bowen Xu, Premkumar Devanbu, Mohammad Amin Alipour

arXiv:2405.02828v13.33 citationsh-index: 23

Originality Synthesis-oriented

AI Analysis

This addresses security risks for software developers using opaque LLMs, but it is incremental as it reviews and organizes existing knowledge rather than proposing new attacks or defenses.

This work tackles the problem of trojan attacks in large language models of code by reviewing state-of-the-art attacks and introducing a novel trigger taxonomy framework, resulting in a unified definition of fundamental concepts and implications for how code models learn on trigger design.

Large language models (LLMs) have provided a lot of exciting new capabilities in software development. However, the opaque nature of these models makes them difficult to reason about and inspect. Their opacity gives rise to potential security risks, as adversaries can train and deploy compromised models to disrupt the software development process in the victims' organization. This work presents an overview of the current state-of-the-art trojan attacks on large language models of code, with a focus on triggers -- the main design point of trojans -- with the aid of a novel unifying trigger taxonomy framework. We also aim to provide a uniform definition of the fundamental concepts in the area of trojans in Code LLMs. Finally, we draw implications of findings on how code models learn on trigger design.

View on arXiv PDF

Similar