Doowon Kim

CR
h-index4
4papers
138citations
Novelty44%
AI Score35

4 Papers

CRSep 16, 2025
A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs

Kiho Lee, Jungkon Kim, Doowon Kim et al.

Code-generating Large Language Models (LLMs) significantly accelerate software development. However, their frequent generation of insecure code presents serious risks. We present a comprehensive evaluation of seven parameter-efficient fine-tuning (PEFT) techniques, demonstrating substantial gains in secure code generation without compromising functionality. Our research identifies prompt-tuning as the most effective PEFT method, achieving an 80.86% Overall-Secure-Rate on CodeGen2 16B, a 13.5-point improvement over the 67.28% baseline. Optimizing decoding strategies through sampling temperature further elevated security to 87.65%. This equates to a reduction of approximately 203,700 vulnerable code snippets per million generated. Moreover, prompt and prefix tuning increase robustness against poisoning attacks in our TrojanPuzzle evaluation, with strong performance against CWE-79 and CWE-502 attack vectors. Our findings generalize across Python and Java, confirming prompt-tuning's consistent effectiveness. This study provides essential insights and practical guidance for building more resilient software systems with LLMs.

CRJun 10, 2024
An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection

Shenao Yan, Shen Wang, Yue Duan et al.

Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CodeBreaker, a pioneering LLM-assisted backdoor attack framework on code completion models. Unlike recent attacks that embed malicious payloads in detectable or irrelevant sections of the code (e.g., comments), CodeBreaker leverages LLMs (e.g., GPT-4) for sophisticated payload transformation (without affecting functionalities), ensuring that both the poisoned data for fine-tuning and generated code can evade strong vulnerability detection. CodeBreaker stands out with its comprehensive coverage of vulnerabilities, making it the first to provide such an extensive set for evaluation. Our extensive experimental evaluations and user studies underline the strong attack performance of CodeBreaker across various settings, validating its superiority over existing approaches. By integrating malicious payloads directly into the source code with minimal transformation, CodeBreaker challenges current security measures, underscoring the critical need for more robust defenses for code completion.

CRMar 23, 2021
Scam Pandemic: How Attackers Exploit Public Fear through Phishing

Marzieh Bitaab, Haehyun Cho, Adam Oest et al.

As the COVID-19 pandemic started triggering widespread lockdowns across the globe, cybercriminals did not hesitate to take advantage of users' increased usage of the Internet and their reliance on it. In this paper, we carry out a comprehensive measurement study of online social engineering attacks in the early months of the pandemic. By collecting, synthesizing, and analyzing DNS records, TLS certificates, phishing URLs, phishing website source code, phishing emails, web traffic to phishing websites, news articles, and government announcements, we track trends of phishing activity between January and May 2020 and seek to understand the key implications of the underlying trends. We find that phishing attack traffic in March and April 2020 skyrocketed up to 220\% of its pre-COVID-19 rate, far exceeding typical seasonal spikes. Attackers exploited victims' uncertainty and fear related to the pandemic through a variety of highly targeted scams, including emerging scam types against which current defenses are not sufficient as well as traditional phishing which outpaced the ecosystem's collective response.

CRMar 8, 2018
Issued for Abuse: Measuring the Underground Trade in Code Signing Certificate

Kristián Kozák, Bum Jun Kwon, Doowon Kim et al.

Recent measurements of the Windows code-signing certificate ecosystem have highlighted various forms of abuse that allow malware authors to produce malicious code carrying valid digital signatures. However, the underground trade that allows miscreants to acquire such certificates is not well understood. In this paper, we illuminate two aspects of this trade. First, we investigate 4 leading vendors of Authenticode certificates, we document how they conduct business, and we estimate their market share. Second, we collect a data set of recently signed malware and we use it to study the relationships among malware developers, malware families and the certificates. We also use information from the black market to fingerprint the certificates traded and to identify when the are likely used to sign malware in the wild. Using these methods, we document a shift in the methods that malware authors employ to obtain valid digital signatures. While prior studies have reported the use of code-signing certificates that had been compromised or obtained directly from legitimate Certification Authorities, we observe that, in 2017, these methods have become secondary to purchasing certificates from underground vendors. We also find that the need to bypass platform protections such as Microsoft Defender SmartScreen plays a growing role in driving the demand for Authenticode certificates. Together, these findings suggest that the trade in certificates issued for abuse represents an emerging segment of the underground economy.