CRJul 25, 2024
Is the Digital Forensics and Incident Response Pipeline Ready for Text-Based Threats in LLM Era?Avanti Bhandarkar, Ronald Wilson, Anushka Swarup et al.
In the era of generative AI, the widespread adoption of Neural Text Generators (NTGs) presents new cybersecurity challenges, particularly within the realms of Digital Forensics and Incident Response (DFIR). These challenges primarily involve the detection and attribution of sources behind advanced attacks like spearphishing and disinformation campaigns. As NTGs evolve, the task of distinguishing between human and NTG-authored texts becomes critically complex. This paper rigorously evaluates the DFIR pipeline tailored for text-based security systems, specifically focusing on the challenges of detecting and attributing authorship of NTG-authored texts. By introducing a novel human-NTG co-authorship text attack, termed CS-ACT, our study uncovers significant vulnerabilities in traditional DFIR methodologies, highlighting discrepancies between ideal scenarios and real-world conditions. Utilizing 14 diverse datasets and 43 unique NTGs, up to the latest GPT-4, our research identifies substantial vulnerabilities in the forensic profiling phase, particularly in attributing authorship to NTGs. Our comprehensive evaluation points to factors such as model sophistication and the lack of distinctive style within NTGs as significant contributors for these vulnerabilities. Our findings underscore the necessity for more sophisticated and adaptable strategies, such as incorporating adversarial learning, stylizing NTGs, and implementing hierarchical attribution through the mapping of NTG lineages to enhance source attribution. This sets the stage for future research and the development of more resilient text-based security systems.
LGOct 9, 2025
Lyapunov-Stable Adaptive Control for Multimodal Concept DriftTianyu Bell Pan, Mengdi Zhu, Alexa Jordyn Cole et al.
Multimodal learning systems often struggle in non-stationary environments due to concept drift, where changing data distributions can degrade performance. Modality-specific drifts and the lack of mechanisms for continuous, stable adaptation compound this challenge. This paper introduces LS-OGD, a novel adaptive control framework for robust multimodal learning in the presence of concept drift. LS-OGD uses an online controller that dynamically adjusts the model's learning rate and the fusion weights between different data modalities in response to detected drift and evolving prediction errors. We prove that under bounded drift conditions, the LS-OGD system's prediction error is uniformly ultimately bounded and converges to zero if the drift ceases. Additionally, we demonstrate that the adaptive fusion strategy effectively isolates and mitigates the impact of severe modality-specific drift, thereby ensuring system resilience and fault tolerance. These theoretical guarantees establish a principled foundation for developing reliable and continuously adapting multimodal learning systems.
CLSep 13, 2019
A Neural Approach to Irony GenerationMengdi Zhu, Zhiwei Yu, Xiaojun Wan
Ironies can not only express stronger emotions but also show a sense of humor. With the development of social media, ironies are widely used in public. Although many prior research studies have been conducted in irony detection, few studies focus on irony generation. The main challenges for irony generation are the lack of large-scale irony dataset and difficulties in modeling the ironic pattern. In this work, we first systematically define irony generation based on style transfer task. To address the lack of data, we make use of twitter and build a large-scale dataset. We also design a combination of rewards for reinforcement learning to control the generation of ironic sentences. Experimental results demonstrate the effectiveness of our model in terms of irony accuracy, sentiment preservation, and content preservation.
CLSep 13, 2019
Neural Correction Model for Open-Domain Named Entity RecognitionMengdi Zhu, Zheye Deng, Wenhan Xiong et al.
Named Entity Recognition (NER) plays an important role in a wide range of natural language processing tasks, such as relation extraction, question answering, etc. However, previous studies on NER are limited to particular genres, using small manually-annotated or large but low-quality datasets. Meanwhile, previous datasets for open-domain NER, built using distant supervision, suffer from low precision, recall and ratio of annotated tokens (RAT). In this work, to address the low precision and recall problems, we first utilize DBpedia as the source of distant supervision to annotate abstracts from Wikipedia and design a neural correction model trained with a human-annotated NER dataset, DocRED, to correct the false entity labels. In this way, we build a large and high-quality dataset called AnchorNER and then train various models with it. To address the low RAT problem of previous datasets, we introduce a multi-task learning method to exploit the context information. We evaluate our methods on five NER datasets and our experimental results show that models trained with AnchorNER and our multi-task learning method obtain state-of-the-art performances in the open-domain setting.