Yier Jin

CR
h-index45
17papers
368citations
Novelty44%
AI Score47

17 Papers

CRFeb 12
Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt

Yujie Gu, Richeng Jin, Xiaoyu Ji et al.

Large Language Models (LLMs) have achieved remarkable performance and received significant research interest. The enormous computational demands, however, hinder the local deployment on devices with limited resources. The current prevalent LLM inference paradigms require users to send queries to the service providers for processing, which raises critical privacy concerns. Existing approaches propose to allow the users to obfuscate the token embeddings before transmission and utilize local models for denoising. Nonetheless, transmitting the token embeddings and deploying local models may result in excessive communication and computation overhead, preventing practical implementation. In this work, we propose \textbf{DEL}, a framework for \textbf{D}ifferentially private and communication \textbf{E}fficient \textbf{L}LM split inference. More specifically, an embedding projection module and a differentially private stochastic quantization mechanism are proposed to reduce the communication overhead in a privacy-preserving manner. To eliminate the need for local models, we adapt soft prompt at the server side to compensate for the utility degradation caused by privacy. To the best of our knowledge, this is the first work that utilizes soft prompt to improve the trade-off between privacy and utility in LLM inference, and extensive experiments on text generation and natural language understanding benchmarks demonstrate the effectiveness of the proposed method.

CRAug 27, 2019Code
On the (In)security of Bluetooth Low Energy One-Way Secure Connections Only Mode

Yue Zhang, Jian Weng, Rajib Dey et al.

To defeat security threats such as man-in-the-middle (MITM) attacks, Bluetooth Low Energy (BLE) 4.2 and 5.x introduce the Secure Connections Only mode, under which a BLE device accepts only secure paring protocols including Passkey Entry and Numeric Comparison from an initiator, e.g., an Android mobile. However, the BLE specification does not explicitly require the Secure Connection Only mode of the initiator. Taking the Android's BLE programming framework for example, we found that it cannot enforce secure pairing, invalidating the security protection provided by the Secure Connection Only mode. The same problem applies to Apple iOS too. Specifically, we examine the life cycle of a BLE pairing process in Android and identify four severe design flaws. These design flaws can be exploited by attackers to perform downgrading attacks, forcing the BLE pairing protocols to run in the insecure mode without the users' awareness. To validate our findings, we selected and tested 18 popular BLE commercial products and our experimental results proved that downgrading attacks and MITM attacks were all possible to these products. All 3501 BLE apps from Androzoo are also subject to these attacks. For defense, we have designed and implemented a prototype of the Secure Connection Only mode on Android 8 through the Android Open Source Project (AOSP). We have reported the identified BLE pairing vulnerabilities to Bluetooth Special Interest Group (SIG), Google, Apple, Texas Instruments (TI) and all of them are actively addressing this issue. Google rated the reported security flaw a High Severity.

60.3CRMay 7
PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

Qinfeng Li, Yuntai Bao, Jianghui Hu et al.

LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property. However, in untrusted deployments, adversaries can copy and reuse these prompts with other proprietary LLMs, causing economic losses. To protect these prompts, we identify four key challenges: proactivity, runtime protection, usability, and non-portability that existing approaches fail to address. We present PragLocker, a prompt protection scheme that satisfies these requirements. PragLocker constructs function-preserving obfuscated prompts by anchoring semantics with code symbols and then using target-model feedback to inject noise, yielding prompts that only work on the target LLM. Experiments across multiple agent systems, datasets, and foundation LLMs show that PragLocker substantially reduces cross-LLM portability, maintains target performance, and remains robust against adaptive attackers.

CRDec 4, 2025
CryptoTensors: A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution

Huifeng Zhu, Shijie Li, Qinfeng Li et al.

To enhance the performance of large language models (LLMs) in various domain-specific applications, sensitive data such as healthcare, law, and finance are being used to privately customize or fine-tune these models. Such privately adapted LLMs are regarded as either personal privacy assets or corporate intellectual property. Therefore, protecting model weights and maintaining strict confidentiality during deployment and distribution have become critically important. However, existing model formats and deployment frameworks provide little to no built-in support for confidentiality, access control, or secure integration with trusted hardware. Current methods for securing model deployment either rely on computationally expensive cryptographic techniques or tightly controlled private infrastructure. Although these approaches can be effective in specific scenarios, they are difficult and costly for widespread deployment. In this paper, we introduce CryptoTensors, a secure and format-compatible file structure for confidential LLM distribution. Built as an extension to the widely adopted Safetensors format, CryptoTensors incorporates tensor-level encryption and embedded access control policies, while preserving critical features such as lazy loading and partial deserialization. It enables transparent decryption and automated key management, supporting flexible licensing and secure model execution with minimal overhead. We implement a proof-of-concept library, benchmark its performance across serialization and runtime scenarios, and validate its compatibility with existing inference frameworks, including Hugging Face Transformers and vLLM. Our results highlight CryptoTensors as a light-weight, efficient, and developer-friendly solution for safeguarding LLM weights in real-world and widespread deployments.

CRFeb 3, 2024
A Review and Comparison of AI Enhanced Side Channel Analysis

Max Panoff, Honggang Yu, Haoqi Shan et al.

Side Channel Analysis (SCA) presents a clear threat to privacy and security in modern computing systems. The vast majority of communications are secured through cryptographic algorithms. These algorithms are often provably-secure from a cryptographical perspective, but their implementation on real hardware introduces vulnerabilities. Adversaries can exploit these vulnerabilities to conduct SCA and recover confidential information, such as secret keys or internal states. The threat of SCA has greatly increased as machine learning, and in particular deep learning, enhanced attacks become more common. In this work, we will examine the latest state-of-the-art deep learning techniques for side channel analysis, the theory behind them, and how they are conducted. Our focus will be on profiling attacks using deep learning techniques, but we will also examine some new and emerging methodologies enhanced by deep learning techniques, such as non-profiled attacks, artificial trace generation, and others. Finally, different deep learning enhanced SCA schemes attempted against the ANSSI SCA Database (ASCAD) and their relative performance will be evaluated and compared. This will lead to new research directions to secure cryptographic implementations against the latest SCA attacks.

CLJan 27, 2024
Hardware Phi-1.5B: A Large Language Model Encodes Hardware Domain Specific Knowledge

Weimin Fu, Shijie Li, Yifang Zhao et al.

In the rapidly evolving semiconductor industry, where research, design, verification, and manufacturing are intricately linked, the potential of Large Language Models to revolutionize hardware design and security verification is immense. The primary challenge, however, lies in the complexity of hardware specific issues that are not adequately addressed by the natural language or software code knowledge typically acquired during the pretraining stage. Additionally, the scarcity of datasets specific to the hardware domain poses a significant hurdle in developing a foundational model. Addressing these challenges, this paper introduces Hardware Phi 1.5B, an innovative large language model specifically tailored for the hardware domain of the semiconductor industry. We have developed a specialized, tiered dataset comprising small, medium, and large subsets and focused our efforts on pretraining using the medium dataset. This approach harnesses the compact yet efficient architecture of the Phi 1.5B model. The creation of this first pretrained, hardware domain specific large language model marks a significant advancement, offering improved performance in hardware design and verification tasks and illustrating a promising path forward for AI applications in the semiconductor sector.

CROct 16, 2024
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

Qinfeng Li, Tianyue Luo, Xuhong Zhang et al.

Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons. However, deploying proprietary LLMs at the edge without adequate protection introduces critical security threats. Attackers can extract model weights and architectures, enabling unauthorized copying and misuse. Even when protective measures prevent full extraction of model weights, attackers may still perform advanced attacks, such as fine-tuning, to further exploit the model. Existing defenses against these threats typically incur significant computational and communication overhead, making them impractical for edge deployment. To safeguard the edge-deployed LLMs, we introduce CoreGuard, a computation- and communication-efficient protection method. CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.

LGSep 22, 2021
Exploring Adversarial Examples for Efficient Active Learning in Machine Learning Classifiers

Honggang Yu, Shihfeng Zeng, Teng Zhang et al.

Machine learning researchers have long noticed the phenomenon that the model training process will be more effective and efficient when the training samples are densely sampled around the underlying decision boundary. While this observation has already been widely applied in a range of machine learning security techniques, it lacks theoretical analyses of the correctness of the observation. To address this challenge, we first add particular perturbation to original training examples using adversarial attack methods so that the generated examples could lie approximately on the decision boundary of the ML classifiers. We then investigate the connections between active learning and these particular training examples. Through analyzing various representative classifiers such as k-NN classifiers, kernel methods as well as deep neural networks, we establish a theoretical foundation for the observation. As a result, our theoretical proofs provide support to more efficient active learning methods with the help of adversarial examples, contrary to previous works where adversarial examples are often used as destructive solutions. Experimental results show that the established theoretical foundation will guide better active learning strategies based on adversarial examples.

SYMar 25, 2021
CHIMERA: A Hybrid Estimation Approach to Limit the Effects of False Data Injection Attacks

Xiaorui Liu, Yaodan Hu, Charalambos Konstantinou et al.

The reliable operation of power grid is supported by energy management systems (EMS) that provide monitoring and control functionalities. Contingency analysis is a critical application of EMS to evaluate the impacts of outages and prepare for system failures. However, false data injection attacks (FDIAs) have demonstrated the possibility of compromising sensor measurements and falsifying the estimated power system states. As a result, FDIAs may mislead system operations and other EMS applications including contingency analysis and optimal power flow. In this paper, we assess the effect of FDIAs and demonstrate that such attacks can affect the resulted number of contingencies. In order to mitigate the FDIA impact, we propose CHIMERA, a hybrid attack-resilient state estimation approach that integrates model-based and data-driven methods. CHIMERA combines the physical grid information with a Long Short Term Memory (LSTM)-based deep learning model by considering a static loss of weighted least square errors and a dynamic loss of the difference between the temporal variations of the actual and the estimated active power. Our simulation experiments based on the load data from New York state demonstrate that CHIMERA can effectively mitigate 91.74% of the cases in which FDIAs can maliciously modify the contingencies.

SYAug 16, 2020
A Survey of Machine Learning Methods for Detecting False Data Injection Attacks in Power Systems

Ali Sayghe, Yaodan Hu, Ioannis Zografopoulos et al.

Over the last decade, the number of cyberattacks targeting power systems and causing physical and economic damages has increased rapidly. Among them, False Data Injection Attacks (FDIAs) is a class of cyberattacks against power grid monitoring systems. Adversaries can successfully perform FDIAs in order to manipulate the power system State Estimation (SE) by compromising sensors or modifying system data. SE is an essential process performed by the Energy Management System (EMS) towards estimating unknown state variables based on system redundant measurements and network topology. SE routines include Bad Data Detection (BDD) algorithms to eliminate errors from the acquired measurements, e.g., in case of sensor failures. FDIAs can bypass BDD modules to inject malicious data vectors into a subset of measurements without being detected, and thus manipulate the results of the SE process. In order to overcome the limitations of traditional residual-based BDD approaches, data-driven solutions based on machine learning algorithms have been widely adopted for detecting malicious manipulation of sensor data due to their fast execution times and accurate results. This paper provides a comprehensive review of the most up-to-date machine learning methods for detecting FDIAs against power system SE algorithms.

CRJan 17, 2019
RTL-PSC: Automated Power Side-Channel Leakage Assessment at Register-Transfer Level

Miao, He, Jungmin Park et al.

Power side-channel attacks (SCAs) have become a major concern to the security community due to their non-invasive feature, low-cost, and effectiveness in extracting secret information from hardware implementation of cryto algorithms. Therefore, it is imperative to evaluate if the hardware is vulnerable to SCAs during its design and validation stages. Currently, however, there is little-known effort in evaluating the vulnerability of a hardware to SCAs at early design stage. In this paper, we propose, for the first time, an automated framework, named RTL-PSC, for power side-channel leakage assessment of hardware crypto designs at register-transfer level (RTL) with built-in evaluation metrics. RTL-PSC first estimates power profile of a hardware design using functional simulation at RTL. Then it utilizes the evaluation metrics, comprising of KL divergence metric and the success rate (SR) metric based on maximum likelihood estimation to perform power side-channel leakage (PSC) vulnerability assessment at RTL. We analyze Galois-Field (GF) and Look-up Table (LUT) based AES designs using RTL-PSC and validate its effectiveness and accuracy through both gate-level simulation and FPGA results. RTL-PSC is also capable of identifying blocks inside the design that contribute the most to the PSC vulnerability which can be used for efficient countermeasure implementation.

CRMay 15, 2018
IoT Security: An End-to-End View and Case Study

Zhen Ling, Kaizheng Liu, Yiling Xu et al.

In this paper, we present an end-to-end view of IoT security and privacy and a case study. Our contribution is three-fold. First, we present our end-to-end view of an IoT system and this view can guide risk assessment and design of an IoT system. We identify 10 basic IoT functionalities that are related to security and privacy. Based on this view, we systematically present security and privacy requirements in terms of IoT system, software, networking and big data analytics in the cloud. Second, using the end-to-end view of IoT security and privacy, we present a vulnerability analysis of the Edimax IP camera system. We are the first to exploit this system and have identified various attacks that can fully control all the cameras from the manufacturer. Our real-world experiments demonstrate the effectiveness of the discovered attacks and raise the alarms again for the IoT manufacturers. Third, such vulnerabilities found in the exploit of Edimax cameras and our previous exploit of Edimax smartplugs can lead to another wave of Mirai attacks, which can be either botnets or worm attacks. To systematically understand the damage of the Mirai malware, we model propagation of the Mirai and use the simulations to validate the modeling. The work in this paper raises the alarm again for the IoT device manufacturers to better secure their products in order to prevent malware attacks like Mirai.

NEMar 14, 2018
MT-Spike: A Multilayer Time-based Spiking Neuromorphic Architecture with Temporal Error Backpropagation

Tao Liu, Zihao Liu, Fuhong Lin et al.

Modern deep learning enabled artificial neural networks, such as Deep Neural Network (DNN) and Convolutional Neural Network (CNN), have achieved a series of breaking records on a broad spectrum of recognition applications. However, the enormous computation and storage requirements associated with such deep and complex neural network models greatly challenge their implementations on resource-limited platforms. Time-based spiking neural network has recently emerged as a promising solution in Neuromorphic Computing System designs for achieving remarkable computing and power efficiency within a single chip. However, the relevant research activities have been narrowly concentrated on the biological plausibility and theoretical learning approaches, causing inefficient neural processing and impracticable multilayer extension thus significantly limitations on speed and accuracy when handling the realistic cognitive tasks. In this work, a practical multilayer time-based spiking neuromorphic architecture, namely "MT-Spike", is developed to fill this gap. With the proposed practical time-coding scheme, average delay response model, temporal error backpropagation algorithm, and heuristic loss function, "MT-Spike" achieves more efficient neural processing through flexible neural model size reduction while offering very competitive classification accuracy for realistic recognition tasks. Simulation results well validated that the algorithmic power of deep multi-layer learning can be seamlessly merged with the efficiency of time-based spiking neuromorphic architecture, demonstrating great potentials of "MT-Spike" in resource and power constrained embedded platforms.

NEMar 14, 2018
PT-Spike: A Precise-Time-Dependent Single Spike Neuromorphic Architecture with Efficient Supervised Learning

Tao Liu, Lei Jiang, Yier Jin et al.

One of the most exciting advancements in AI over the last decade is the wide adoption of ANNs, such as DNN and CNN, in many real-world applications. However, the underlying massive amounts of computation and storage requirement greatly challenge their applicability in resource-limited platforms like the drone, mobile phone, and IoT devices etc. The third generation of neural network model--Spiking Neural Network (SNN), inspired by the working mechanism and efficiency of human brain, has emerged as a promising solution for achieving more impressive computing and power efficiency within light-weighted devices (e.g. single chip). However, the relevant research activities have been narrowly carried out on conventional rate-based spiking system designs for fulfilling the practical cognitive tasks, underestimating SNN's energy efficiency, throughput, and system flexibility. Although the time-based SNN can be more attractive conceptually, its potentials are not unleashed in realistic applications due to lack of efficient coding and practical learning schemes. In this work, a Precise-Time-Dependent Single Spike Neuromorphic Architecture, namely "PT-Spike", is developed to bridge this gap. Three constituent hardware-favorable techniques: precise single-spike temporal encoding, efficient supervised temporal learning, and fast asymmetric decoding are proposed accordingly to boost the energy efficiency and data processing capability of the time-based SNN at a more compact neural network model size when executing real cognitive tasks. Simulation results show that "PT-Spike" demonstrates significant improvements in network size, processing efficiency and power consumption with marginal classification accuracy degradation when compared with the rate-based SNN and ANN under the similar network configuration.

LGFeb 14, 2018
Security Analysis and Enhancement of Model Compressed Deep Learning Systems under Adversarial Attacks

Qi Liu, Tao Liu, Zihao Liu et al.

DNN is presenting human-level performance for many complex intelligent tasks in real-world applications. However, it also introduces ever-increasing security concerns. For example, the emerging adversarial attacks indicate that even very small and often imperceptible adversarial input perturbations can easily mislead the cognitive function of deep learning systems (DLS). Existing DNN adversarial studies are narrowly performed on the ideal software-level DNN models with a focus on single uncertainty factor, i.e. input perturbations, however, the impact of DNN model reshaping on adversarial attacks, which is introduced by various hardware-favorable techniques such as hash-based weight compression during modern DNN hardware implementation, has never been discussed. In this work, we for the first time investigate the multi-factor adversarial attack problem in practical model optimized deep learning systems by jointly considering the DNN model-reshaping (e.g. HashNet based deep compression) and the input perturbations. We first augment adversarial example generating method dedicated to the compressed DNN models by incorporating the software-based approaches and mathematical modeled DNN reshaping. We then conduct a comprehensive robustness and vulnerability analysis of deep compressed DNN models under derived adversarial attacks. A defense technique named "gradient inhibition" is further developed to ease the generating of adversarial examples thus to effectively mitigate adversarial attacks towards both software and hardware-oriented DNNs. Simulation results show that "gradient inhibition" can decrease the average success rate of adversarial attacks from 87.99% to 4.77% (from 86.74% to 4.64%) on MNIST (CIFAR-10) benchmark with marginal accuracy degradation across various DNNs.

CRDec 21, 2017
Wolf in Sheep's Clothing - The Downscaling Attack Against Deep Learning Applications

Qixue Xiao, Kang Li, Deyue Zhang et al.

This paper considers security risks buried in the data processing pipeline in common deep learning applications. Deep learning models usually assume a fixed scale for their training and input data. To allow deep learning applications to handle a wide range of input data, popular frameworks, such as Caffe, TensorFlow, and Torch, all provide data scaling functions to resize input to the dimensions used by deep learning models. Image scaling algorithms are intended to preserve the visual features of an image after scaling. However, common image scaling algorithms are not designed to handle human crafted images. Attackers can make the scaling outputs look dramatically different from the corresponding input images. This paper presents a downscaling attack that targets the data scaling process in deep learning applications. By carefully crafting input data that mismatches with the dimension used by deep learning models, attackers can create deceiving effects. A deep learning application effectively consumes data that are not the same as those presented to users. The visual inconsistency enables practical evasion and data poisoning attacks to deep learning applications. This paper presents proof-of-concept attack samples to popular deep-learning-based image classification applications. To address the downscaling attacks, the paper also suggests multiple potential mitigation strategies.

CRMar 8, 2017
Execution Integrity with In-Place Encryption

Dean Sullivan, Orlando Arias, David Gens et al.

Instruction set randomization (ISR) was initially proposed with the main goal of countering code-injection attacks. However, ISR seems to have lost its appeal since code-injection attacks became less attractive because protection mechanisms such as data execution prevention (DEP) as well as code-reuse attacks became more prevalent. In this paper, we show that ISR can be extended to also protect against code-reuse attacks while at the same time offering security guarantees similar to those of software diversity, control-flow integrity, and information hiding. We present Scylla, a scheme that deploys a new technique for in-place code encryption to hide the code layout of a randomized binary, and restricts the control flow to a benign execution path. This allows us to i) implicitly restrict control-flow targets to basic block entries without requiring the extraction of a control-flow graph, ii) achieve execution integrity within legitimate basic blocks, and iii) hide the underlying code layout under malicious read access to the program. Our analysis demonstrates that Scylla is capable of preventing state-of-the-art attacks such as just-in-time return-oriented programming (JIT-ROP) and crash-resistant oriented programming (CROP). We extensively evaluate our prototype implementation of Scylla and show feasible performance overhead. We also provide details on how this overhead can be significantly reduced with dedicated hardware support.