CLJun 21, 2022
TraSE: Towards Tackling Authorial Style from a Cognitive Science PerspectiveRonald Wilson, Avanti Bhandarkar, Damon Woodard
Stylistic analysis of text is a key task in research areas ranging from authorship attribution to forensic analysis and personality profiling. The existing approaches for stylistic analysis are plagued by issues like topic influence, lack of discriminability for large number of authors and the requirement for large amounts of diverse data. In this paper, the source of these issues are identified along with the necessity for a cognitive perspective on authorial style in addressing them. A novel feature representation, called Trajectory-based Style Estimation (TraSE), is introduced to support this purpose. Authorship attribution experiments with over 27,000 authors and 1.4 million samples in a cross-domain scenario resulted in 90% attribution accuracy suggesting that the feature representation is immune to such negative influences and an excellent candidate for stylistic analysis. Finally, a qualitative analysis is performed on TraSE using physical human characteristics, like age, to validate its claim on capturing cognitive traits.
CRJul 25, 2024
Is the Digital Forensics and Incident Response Pipeline Ready for Text-Based Threats in LLM Era?Avanti Bhandarkar, Ronald Wilson, Anushka Swarup et al.
In the era of generative AI, the widespread adoption of Neural Text Generators (NTGs) presents new cybersecurity challenges, particularly within the realms of Digital Forensics and Incident Response (DFIR). These challenges primarily involve the detection and attribution of sources behind advanced attacks like spearphishing and disinformation campaigns. As NTGs evolve, the task of distinguishing between human and NTG-authored texts becomes critically complex. This paper rigorously evaluates the DFIR pipeline tailored for text-based security systems, specifically focusing on the challenges of detecting and attributing authorship of NTG-authored texts. By introducing a novel human-NTG co-authorship text attack, termed CS-ACT, our study uncovers significant vulnerabilities in traditional DFIR methodologies, highlighting discrepancies between ideal scenarios and real-world conditions. Utilizing 14 diverse datasets and 43 unique NTGs, up to the latest GPT-4, our research identifies substantial vulnerabilities in the forensic profiling phase, particularly in attributing authorship to NTGs. Our comprehensive evaluation points to factors such as model sophistication and the lack of distinctive style within NTGs as significant contributors for these vulnerabilities. Our findings underscore the necessity for more sophisticated and adaptable strategies, such as incorporating adversarial learning, stylizing NTGs, and implementing hierarchical attribution through the mapping of NTG lineages to enhance source attribution. This sets the stage for future research and the development of more resilient text-based security systems.
27.1CRApr 21
A Data-Free Membership Inference Attack on Federated Learning in Hardware AssuranceGijung Lee, Wavid Bowman, Olivia P. Dizon-Paradis et al.
Federated Learning (FL) is an emerging solution to the data scarcity problem for training deep learning models in hardware assurance. While FL is designed to enhance privacy by not sharing raw data, it remains vulnerable to Membership Inference Attacks (MIAs) that can leak sensitive intellectual property (IP). Traditional MIAs are often impractical in this domain because they require access to auxiliary datasets that can match the unique statistical properties of private data. This paper introduces a novel, data-free MIA targeting image segmentation models in FL for hardware assurance. Our methodology leverages Standard Cell Library Layouts (SCLLs) as priors to guide a gradient inversion attack, allowing an adversary to reconstruct images from a client's intercepted model update without needing any private data. We demonstrate that, by analyzing the reconstruction fidelity, an adversary can infer sensitive hardware characteristics, successfully distinguishing between circuit layers (e.g., metal vs. diffusion) and technology nodes (e.g., 32nm vs. 90nm). Our findings reveal that a novel loss term can conditionally amplify the attack's effectiveness by overcoming evaluation bottlenecks for structurally complex data. This work underscores a significant IP risk, challenging the assumption that FL provides inherent privacy guarantees and proving that severe information leakage can occur even without access to domain-specific datasets.
26.8CRApr 21
Potentials and Pitfalls of Applying Federated Learning in Hardware AssuranceGijung Lee, Wavid Bowman, Olivia Dizon-Paradis et al.
As microelectronics flourish and outsourcing of the design and manufacturing stages of integrated circuits (ICs) and printed circuit boards (PCBs) becomes the norm, microelectronics stakeholders must also confront a new wave of security challenges, including the threats posed by hardware Trojans, counterfeit electronics, and reverse engineering attacks. Traditional detection and prevention methods like testing and side-channel analysis have limitations in reliability and scalability. Automated reverse engineering by deep learning (DL) models is a foolproof approach to hardware assurance, but faces challenges due to limited data. By pooling data from different stakeholders (competitors in industry, governments, etc.), DL models can be more effectively trained but privacy of intellectual property (IP) is a significant concern. Federated Learning (FL) has been proposed as a potential alternative allowing for the collaborative training of a DL model without sharing raw data. While FL has been widely used in healthcare, IoT, and finance, its application in hardware assurance remains underexplored. This study investigates, for the first time, FL-based DL for hardware assurance, demonstrating that FL outperforms single-client centralized learning in segmentation tasks for reverse engineering. Our results show that increasing the number of clients improves FL performance by collaboratively training the model with more data. However, and more importantly, a major pitfall of FL is also exposed -- it remains vulnerable to gradient inversion attacks. We show that SEM images used in FL can be recovered by attackers, which would therefore expose the sensitive and proprietary IPs that FL was supposed to protect. We highlight these privacy risks and also suggest future research directions to improve security and effectiveness in hardware assurance.
33.3CRApr 21
DECIFR: Domain-Aware Exfiltration of Circuit Information from Federated Gradient ReconstructionGijung Lee, Wavid Bowman, Olivia P. Dizon-Paradis et al.
Federated Learning (FL) is a promising approach for multiparty collaboration as a privacy-preserving technique in hardware assurance, but its security against adversaries with domain-specific knowledge is underexplored. This paper demonstrates a critical vulnerability where available standard cell library layouts (SCLL) can be exploited to compromise the privacy of sensitive integrated circuit (IC) training data. We introduce DECIFR, a novel two-stage Membership Inference Attack (MIA) that requires no auxiliary dataset. The attack employs a guided Gradient Inversion Attack (GIA) to reconstruct a client's training images from intercepted model updates. Our findings reveal that the fidelity of these reconstructions directly correlates with membership status, allowing an adversary to reliably distinguish members from non-members based on image quality. This work exposes a practical threat that overcomes the limitations of conventional attacks and underscores that standard FL protocols are insufficient for securing domains with extensive knowledge. We conclude that robust defenses are essential for the secure application of FL in hardware assurance.
CLSep 7, 2024
Maximizing Relation Extraction Potential: A Data-Centric Study to Unveil Challenges and OpportunitiesAnushka Swarup, Avanti Bhandarkar, Olivia P. Dizon-Paradis et al.
Relation extraction is a Natural Language Processing task that aims to extract relationships from textual data. It is a critical step for information extraction. Due to its wide-scale applicability, research in relation extraction has rapidly scaled to using highly advanced neural networks. Despite their computational superiority, modern relation extractors fail to handle complicated extraction scenarios. However, a comprehensive performance analysis of the state-of-the-art extractors that compile these challenges has been missing from the literature, and this paper aims to bridge this gap. The goal has been to investigate the possible data-centric characteristics that impede neural relation extraction. Based on extensive experiments conducted using 15 state-of-the-art relation extraction algorithms ranging from recurrent architectures to large language models and seven large-scale datasets, this research suggests that modern relation extractors are not robust to complex data and relation characteristics. It emphasizes pivotal issues, such as contextual ambiguity, correlating relations, long-tail data, and fine-grained relation distributions. In addition, it sets a marker for future directions to alleviate these issues, thereby proving to be a critical resource for novice and advanced researchers. Efficient handling of the challenges described can have significant implications for the field of information extraction, which is a critical part of popular systems such as search engines and chatbots. Data and relevant code can be found at \url{https://aaig.ece.ufl.edu/projects/relation-extraction}.
LGOct 9, 2025
Lyapunov-Stable Adaptive Control for Multimodal Concept DriftTianyu Bell Pan, Mengdi Zhu, Alexa Jordyn Cole et al.
Multimodal learning systems often struggle in non-stationary environments due to concept drift, where changing data distributions can degrade performance. Modality-specific drifts and the lack of mechanisms for continuous, stable adaptation compound this challenge. This paper introduces LS-OGD, a novel adaptive control framework for robust multimodal learning in the presence of concept drift. LS-OGD uses an online controller that dynamically adjusts the model's learning rate and the fusion weights between different data modalities in response to detected drift and evolving prediction errors. We prove that under bounded drift conditions, the LS-OGD system's prediction error is uniformly ultimately bounded and converges to zero if the drift ceases. Additionally, we demonstrate that the adaptive fusion strategy effectively isolates and mitigates the impact of severe modality-specific drift, thereby ensuring system resilience and fault tolerance. These theoretical guarantees establish a principled foundation for developing reliable and continuously adapting multimodal learning systems.
IVApr 28, 2020
Histogram-based Auto Segmentation: A Novel Approach to Segmenting Integrated Circuit Structures from SEM ImagesRonald Wilson, Navid Asadizanjani, Domenic Forte et al.
In the Reverse Engineering and Hardware Assurance domain, a majority of the data acquisition is done through electron microscopy techniques such as Scanning Electron Microscopy (SEM). However, unlike its counterparts in optical imaging, only a limited number of techniques are available to enhance and extract information from the raw SEM images. In this paper, we introduce an algorithm to segment out Integrated Circuit (IC) structures from the SEM image. Unlike existing algorithms discussed in this paper, this algorithm is unsupervised, parameter-free and does not require prior information on the noise model or features in the target image making it effective in low quality image acquisition scenarios as well. Furthermore, the results from the application of the algorithm on various structures and layers in the IC are reported and discussed.
IVFeb 11, 2020
Hardware Trust and Assurance through Reverse Engineering: A Survey and Outlook from Image Analysis and Machine Learning PerspectivesUlbert J. Botero, Ronald Wilson, Hangwei Lu et al.
In the context of hardware trust and assurance, reverse engineering has been often considered as an illegal action. Generally speaking, reverse engineering aims to retrieve information from a product, i.e., integrated circuits (ICs) and printed circuit boards (PCBs) in hardware security-related scenarios, in the hope of understanding the functionality of the device and determining its constituent components. Hence, it can raise serious issues concerning Intellectual Property (IP) infringement, the (in)effectiveness of security-related measures, and even new opportunities for injecting hardware Trojans. Ironically, reverse engineering can enable IP owners to verify and validate the design. Nevertheless, this cannot be achieved without overcoming numerous obstacles that limit successful outcomes of the reverse engineering process. This paper surveys these challenges from two complementary perspectives: image processing and machine learning. These two fields of study form a firm basis for the enhancement of efficiency and accuracy of reverse engineering processes for both PCBs and ICs. In summary, therefore, this paper presents a roadmap indicating clearly the actions to be taken to fulfill hardware trust and assurance objectives.