CLSep 21, 2024Code
Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning ParadigmJaehan Kim, Minkyoo Song, Seung Ho Na et al.
Parameter-efficient fine-tuning (PEFT) has become a key training strategy for large language models. However, its reliance on fewer trainable parameters poses security risks, such as task-agnostic backdoors. Despite their severe impact on a wide range of tasks, there is no practical defense solution available that effectively counters task-agnostic backdoors within the context of PEFT. In this study, we introduce Obliviate, a PEFT-integrable backdoor defense. We develop two techniques aimed at amplifying benign neurons within PEFT layers and penalizing the influence of trigger tokens. Our evaluations across three major PEFT architectures show that our method can significantly reduce the attack success rate of the state-of-the-art task-agnostic backdoors (83.6%$\downarrow$). Furthermore, our method exhibits robust defense capabilities against both task-specific backdoors and adaptive attacks. Source code will be obtained at https://github.com/obliviateARR/Obliviate.
CROct 18, 2024
When LLMs Go Online: The Emerging Threat of Web-Enabled LLMsHanna Kim, Minkyoo Song, Seung Ho Na et al.
Recent advancements in Large Language Models (LLMs) have established them as agentic systems capable of planning and interacting with various tools. These LLM agents are often paired with web-based tools, enabling access to diverse sources and real-time information. Although these advancements offer significant benefits across various applications, they also increase the risk of malicious use, particularly in cyberattacks involving personal information. In this work, we investigate the risks associated with misuse of LLM agents in cyberattacks involving personal data. Specifically, we aim to understand: 1) how potent LLM agents can be when directed to conduct cyberattacks, 2) how cyberattacks are enhanced by web-based tools, and 3) how affordable and easy it becomes to launch cyberattacks using LLM agents. We examine three attack scenarios: the collection of Personally Identifiable Information (PII), the generation of impersonation posts, and the creation of spear-phishing emails. Our experiments reveal the effectiveness of LLM agents in these attacks: LLM agents achieved a precision of up to 95.9% in collecting PII, generated impersonation posts where 93.9% of them were deemed authentic, and boosted click rate of phishing links in spear phishing emails by 46.67%. Additionally, our findings underscore the limitations of existing safeguards in contemporary commercial LLMs, emphasizing the urgent need for robust security measures to prevent the misuse of LLM agents.
SISep 13, 2021
Meta-Path-based Fake News Detection Leveraging Multi-level Social Context InformationJian Cui, Kwanwoo Kim, Seung Ho Na et al.
Fake news, false or misleading information presented as news, has a significant impact on many aspects of society, such as in politics or healthcare domains. Due to the deceiving nature of fake news, applying Natural Language Processing (NLP) techniques to the news content alone is insufficient. The multi-level social context information (news publishers and engaged users in social media) and temporal information of user engagement are important information in fake news detection. The proper usage of this information, however, introduces three chronic difficulties: 1) multi-level social context information is hard to be used without information loss, 2) temporal information is hard to be used along with multi-level social context information, 3) news representation with multi-level social context and temporal information is hard to be learned in an end-to-end manner. To overcome all three difficulties, we propose a novel fake news detection framework, Hetero-SCAN. We use Meta-Path to extract meaningful multi-level social context information without loss. Meta-Path, a composite relation connecting two node types, is proposed to capture the semantics in the heterogeneous graph. We then propose Meta-Path instance encoding and aggregation methods to capture the temporal information of user engagement and produce news representation end-to-end. According to our experiment, Hetero-SCAN yields significant performance improvement over state-of-the-art fake news detection methods.