LGFeb 21, 2024
Learning to Poison Large Language Models for Downstream ManipulationXiangyu Zhou, Yao Qiang, Saleh Zare Zade et al.
The advent of Large Language Models (LLMs) has marked significant achievements in language processing and reasoning capabilities. Despite their advancements, LLMs face vulnerabilities to data poisoning attacks, where the adversary inserts backdoor triggers into training data to manipulate outputs. This work further identifies additional security risks in LLMs by designing a new data poisoning attack tailored to exploit the supervised fine-tuning (SFT) process. We propose a novel gradient-guided backdoor trigger learning (GBTL) algorithm to identify adversarial triggers efficiently, ensuring an evasion of detection by conventional defenses while maintaining content integrity. Through experimental validation across various language model tasks, including sentiment analysis, domain generation, and question answering, our poisoning strategy demonstrates a high success rate in compromising various LLMs' outputs. We further propose two defense strategies against data poisoning attacks, including in-context learning (ICL) and continuous learning (CL), which effectively rectify the behavior of LLMs and significantly reduce the decline in performance. Our work highlights the significant security risks present during SFT of LLMs and the necessity of safeguarding LLMs against data poisoning attacks.
LGJun 3, 2025
Not All Tokens Are Meant to Be ForgottenXiangyu Zhou, Yao Qiang, Saleh Zare Zade et al.
Large Language Models (LLMs), pre-trained on massive text corpora, exhibit remarkable human-level language understanding, reasoning, and decision-making abilities. However, they tend to memorize unwanted information, such as private or copyrighted content, raising significant privacy and legal concerns. Unlearning has emerged as a promising solution, but existing methods face a significant challenge of over-forgetting. This issue arises because they indiscriminately suppress the generation of all the tokens in forget samples, leading to a substantial loss of model utility. To overcome this challenge, we introduce the Targeted Information Forgetting (TIF) framework, which consists of (1) a flexible targeted information identifier designed to differentiate between unwanted words (UW) and general words (GW) in the forget samples, and (2) a novel Targeted Preference Optimization approach that leverages Logit Preference Loss to unlearn unwanted information associated with UW and Preservation Loss to retain general information in GW, effectively improving the unlearning process while mitigating utility degradation. Extensive experiments on the TOFU and MUSE benchmarks demonstrate that the proposed TIF framework enhances unlearning effectiveness while preserving model utility and achieving state-of-the-art results.
HCJan 6, 2022
Designing Social VR: A Collection of Design Choices Across Commercial and Research ApplicationsRyan Handley, Bert Guerra, Rukkmini Goli et al.
Social VR has experienced tremendous growth in the commercial space recently as an emerging technology for rich interactions themed around leisure, work, and relationship building. As a result, the state of social VR application design has become rapidly obfuscated, which complicates identification of design trends and uncommon features that could inform future design, and hinders inclusion of new voices in this design space. To help address this problem, we present a taxonomy of social VR application design choices as informed by 44 commercial and prototypical applications. Our taxonomy was informed by multiple discovery strategies including literature review, search of VR-themed subreddits, and autobiographical landscape research. The taxonomy elucidates various features across three design areas: the self, interaction, and the environment.
HCJun 26, 2021
Immersive Stories for Health Information: Design Considerations from Binge Drinking in VRDouglas Zytko, Zexin Ma, Jacob Gleason et al.
Immersive stories for health are 360-degree videos that intend to alter viewer perceptions about behaviors detrimental to health. They have potential to inform public health at scale, however, immersive story design is still in early stages and largely devoid of best practices. This paper presents a focus group study with 147 viewers of an immersive story about binge drinking experienced through VR headsets and mobile phones. The objective of the study is to identify aspects of immersive story design that influence attitudes towards the health issue exhibited, and to understand how health information is consumed in immersive stories. Findings emphasize the need for an immersive story to provide reasoning behind character engagement in the focal health behavior, to show the main character clearly engaging in the behavior, and to enable viewers to experience escalating symptoms of the behavior before the penultimate health consequence. Findings also show how the design of supporting characters can inadvertently distract viewers and lead them to justify the detrimental behavior being exhibited. The paper concludes with design considerations for enabling immersive stories to better inform public perception of health issues.
HCMay 16, 2021
Computer-Mediated Consent to Sex: The Context of TinderDouglas Zytko, Nicholas Furlo, Bailey Carlin et al.
This paper reports an interview study about how consent to sexual activity is computer-mediated. The study's context of online dating is chosen due to the prevalence of sexual violence, or nonconsensual sexual activity, that is associated with dating app-use. Participants (n=19) represent a range of gender identities and sexual orientations, and predominantly used the dating app Tinder. Findings reveal two computer-mediated consent processes: consent signaling and affirmative consent. With consent signaling, users employed Tinder's interface to infer and imply agreement to sex without any explicit confirmation before making sexual advances in-person. With affirmative consent, users employed the interface to establish patterns of overt discourse around sex and consent across online and offline modalities. The paper elucidates shortcomings of both computer-mediated consent processes that leave users susceptible to sexual violence and envisions dating apps as potential sexual violence prevention solutions if deliberately designed to mediate consent exchange.