CR LG PL SEAug 5, 2024

Black-Box Adversarial Attacks on LLM-Based Code Completion

Slobodan Jenko, Niels Mündler, Jingxuan He, Mark Vero, Martin Vechev

ETH Zurich

arXiv:2408.02509v213.315 citationsh-index: 64Has Code

Originality Highly original

AI Analysis

This addresses a critical security problem for developers relying on code completion tools, as it reveals a practical vulnerability that could lead to widespread insecure code generation.

The paper tackles the security of LLM-based code completion engines by showing that black-box adversarial attacks can significantly increase the rate of insecure code generation, with INSEC increasing it by over 50% across diverse test cases while maintaining functional correctness.

Modern code completion engines, powered by large language models (LLMs), assist millions of developers with their strong capabilities to generate functionally correct code. Due to this popularity, it is crucial to investigate the security implications of relying on LLM-based code completion. In this work, we demonstrate that state-of-the-art black-box LLM-based code completion engines can be stealthily biased by adversaries to significantly increase their rate of insecure code generation. We present the first attack, named INSEC, that achieves this goal. INSEC works by injecting an attack string as a short comment in the completion input. The attack string is crafted through a query-based optimization procedure starting from a set of carefully designed initialization schemes. We demonstrate INSEC's broad applicability and effectiveness by evaluating it on various state-of-the-art open-source models and black-box commercial services (e.g., OpenAI API and GitHub Copilot). On a diverse set of security-critical test cases, covering 16 CWEs across 5 programming languages, INSEC increases the rate of generated insecure code by more than 50%, while maintaining the functional correctness of generated code. We consider INSEC practical -- it requires low resources and costs less than 10 US dollars to develop on commodity hardware. Moreover, we showcase the attack's real-world deployability, by developing an IDE plug-in that stealthily injects INSEC into the GitHub Copilot extension.

View on arXiv PDF

Similar