LGDec 10, 2025
Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMsSohely Jahan, Ruimin Sun
As medical large language models (LLMs) become increasingly integrated into clinical workflows, concerns around alignment robustness, and safety are escalating. Prior work on model extraction has focused on classification models or memorization leakage, leaving the vulnerability of safety-aligned generative medical LLMs underexplored. We present a black-box distillation attack that replicates the domain-specific reasoning of safety-aligned medical LLMs using only output-level access. By issuing 48,000 instruction queries to Meditron-7B and collecting 25,000 benign instruction response pairs, we fine-tune a LLaMA3 8B surrogate via parameter efficient LoRA under a zero-alignment supervision setting, requiring no access to model weights, safety filters, or training data. With a cost of $12, the surrogate achieves strong fidelity on benign inputs while producing unsafe completions for 86% of adversarial prompts, far exceeding both Meditron-7B (66%) and the untuned base model (46%). This reveals a pronounced functional-ethical gap, task utility transfers, while alignment collapses. To analyze this collapse, we develop a dynamic adversarial evaluation framework combining Generative Query (GQ)-based harmful prompt generation, verifier filtering, category-wise failure analysis, and adaptive Random Search (RS) jailbreak attacks. We also propose a layered defense system, as a prototype detector for real-time alignment drift in black-box deployments. Our findings show that benign-only black-box distillation exposes a practical and under-recognized threat: adversaries can cheaply replicate medical LLM capabilities while stripping safety mechanisms, underscoring the need for extraction-aware safety monitoring.
CROct 27, 2025Code
PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMsJiaqi Xue, Yifei Zhao, Mansour Al Ghanim et al.
Text watermarking for large language models (LLMs) enables model owners to verify text origin and protect intellectual property. While watermarking methods for closed-source LLMs are relatively mature, extending them to open-source models remains challenging, as developers cannot control the decoding process. Consequently, owners of open-source LLMs lack practical means to verify whether text was generated by their models. A core difficulty lies in embedding watermarks directly into model weights without hurting detectability. A promising idea is to distill watermarks from a closed-source model into an open one, but this suffers from (i) poor detectability due to mismatch between learned and predefined patterns, and (ii) fragility to downstream modifications such as fine-tuning or model merging. To overcome these limitations, we propose PRO, a Precise and Robust text watermarking method for open-source LLMs. PRO jointly trains a watermark policy model with the LLM, producing patterns that are easier for the model to learn and more consistent with detection criteria. A regularization term further simulates downstream perturbations and penalizes degradation in watermark detectability, ensuring robustness under model edits. Experiments on open-source LLMs (e.g., LLaMA-3.2, LLaMA-3, Phi-2) show that PRO substantially improves both watermark detectability and resilience to model modifications.
CROct 22, 2025
SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for Large Language Model DeploymentTushar Nayan, Ziqi Zhang, Ruimin Sun
With the increasing deployment of Large Language Models (LLMs) on mobile and edge platforms, securing them against model extraction attacks has become a pressing concern. However, protecting model privacy without sacrificing the performance benefits of untrusted AI accelerators, such as GPUs, presents a challenging trade-off. In this paper, we initiate the study of high-performance execution on LLMs and present SecureInfer, a hybrid framework that leverages a heterogeneous Trusted Execution Environments (TEEs)-GPU architecture to isolate privacy-critical components while offloading compute-intensive operations to untrusted accelerators. Building upon an outsourcing scheme, SecureInfer adopts an information-theoretic and threat-informed partitioning strategy: security-sensitive components, including non-linear layers, projection of attention head, FNN transformations, and LoRA adapters, are executed inside an SGX enclave, while other linear operations (matrix multiplication) are performed on the GPU after encryption and are securely restored within the enclave. We implement a prototype of SecureInfer using the LLaMA-2 model and evaluate it across performance and security metrics. Our results show that SecureInfer offers strong security guarantees with reasonable performance, offering a practical solution for secure on-device model inference.
CRJan 13, 2022
D-Box: DMA-enabled Compartmentalization for Embedded ApplicationsAlejandro Mera, Yi Hui Chen, Ruimin Sun et al.
Embedded and Internet-of-Things (IoT) devices have seen an increase in adoption in many domains. The security of these devices is of great importance as they are often used to control critical infrastructure, medical devices, and vehicles. Existing solutions to isolate microcontroller (MCU) resources in order to increase their security face significant challenges such as specific hardware unavailability, Memory Protection Unit (MPU) limitations and a significant lack of Direct Memory Access (DMA) support. Nevertheless, DMA is fundamental for the power and performance requirements of embedded applications. In this paper, we present D-Box, a systematic approach to enable secure DMA operations for compartmentalization solutions of embedded applications using real-time operating systems (RTOS). D-Box defines a reference architecture and a workflow to protect DMA operations holistically. It provides practical methods to harden the kernel and define capability-based security policies for easy definition of DMA operations with strong security properties. We implemented a D-Box prototype for the Cortex-M3/M4 on top of the popular FreeRTOS-MPU (F-MPU). The D-Box procedures and a stricter security model enabled DMA operations, yet it exposed 41 times less ROP (return-orienting-programming) gadgets when compared with the standard F-MPU. D-Box adds only a 2% processor overhead while reducing the power consumption of peripheral operation benchmarks by 18.2%. The security properties and performance of D-Box were tested and confirmed on a real-world case study of a Programmable Logic Controller (PLC) application.
LGMay 20, 2021
Online Binary Models are Promising for Distinguishing Temporally Consistent Computer Usage ProfilesLuiz Giovanini, Fabrício Ceschin, Mirela Silva et al.
This paper investigates whether computer usage profiles comprised of process-, network-, mouse-, and keystroke-related events are unique and consistent over time in a naturalistic setting, discussing challenges and opportunities of using such profiles in applications of continuous authentication. We collected ecologically-valid computer usage profiles from 31 MS Windows 10 computer users over 8 weeks and submitted this data to comprehensive machine learning analysis involving a diverse set of online and offline classifiers. We found that: (i) profiles were mostly consistent over the 8-week data collection period, with most (83.9%) repeating computer usage habits on a daily basis; (ii) computer usage profiling has the potential to uniquely characterize computer users (with a maximum F-score of 99.90%); (iii) network-related events were the most relevant features to accurately recognize profiles (95.69% of the top features distinguishing users were network-related); and (iv) binary models were the most well-suited for profile recognition, with better results achieved in the online setting compared to the offline setting (maximum F-score of 99.90% vs. 95.50%).
CRNov 11, 2020
ShadowNet: A Secure and Efficient On-device Model Inference System for Convolutional Neural NetworksZhichuang Sun, Ruimin Sun, Changming Liu et al.
With the increased usage of AI accelerators on mobile and edge devices, on-device machine learning (ML) is gaining popularity. Thousands of proprietary ML models are being deployed today on billions of untrusted devices. This raises serious security concerns about model privacy. However, protecting model privacy without losing access to the untrusted AI accelerators is a challenging problem. In this paper, we present a novel on-device model inference system, ShadowNet. ShadowNet protects the model privacy with Trusted Execution Environment (TEE) while securely outsourcing the heavy linear layers of the model to the untrusted hardware accelerators. ShadowNet achieves this by transforming the weights of the linear layers before outsourcing them and restoring the results inside the TEE. The non-linear layers are also kept secure inside the TEE. ShadowNet's design ensures efficient transformation of the weights and the subsequent restoration of the results. We build a ShadowNet prototype based on TensorFlow Lite and evaluate it on five popular CNNs, namely, MobileNet, ResNet-44, MiniVGG, ResNet-404, and YOLOv4-tiny. Our evaluation shows that ShadowNet achieves strong security guarantees with reasonable performance, offering a practical solution for secure on-device model inference.
CRJun 9, 2020
SoK: Attacks on Industrial Control Logic and Formal Verification-Based DefensesRuimin Sun, Alejandro Mera, Long Lu et al.
Programmable Logic Controllers (PLCs) play a critical role in the industrial control systems. Vulnerabilities in PLC programs might lead to attacks causing devastating consequences to the critical infrastructure, as shown in Stuxnet and similar attacks. In recent years, we have seen an exponential increase in vulnerabilities reported for PLC control logic. Looking back on past research, we found extensive studies explored control logic modification attacks, as well as formal verification-based security solutions. We performed systematization on these studies, and found attacks that can compromise a full chain of control and evade detection. However, the majority of the formal verification research investigated ad-hoc techniques targeting PLC programs. We discovered challenges in every aspect of formal verification, rising from (1) the ever-expanding attack surface from evolved system design, (2) the real-time constraint during the program execution, and (3) the barrier in security evaluation given proprietary and vendor-specific dependencies on different techniques. Based on the knowledge systematization, we provide a set of recommendations for future research directions, and we highlight the need of defending security issues besides safety issues.
CRFeb 18, 2020
Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile AppsZhichuang Sun, Ruimin Sun, Long Lu et al.
On-device machine learning (ML) is quickly gaining popularity among mobile apps. It allows offline model inference while preserving user privacy. However, ML models, considered as core intellectual properties of model owners, are now stored on billions of untrusted devices and subject to potential thefts. Leaked models can cause both severe financial loss and security consequences. This paper presents the first empirical study of ML model protection on mobile devices. Our study aims to answer three open questions with quantitative evidence: How widely is model protection used in apps? How robust are existing model protection techniques? What impacts can (stolen) models incur? To that end, we built a simple app analysis pipeline and analyzed 46,753 popular apps collected from the US and Chinese app markets. We identified 1,468 ML apps spanning all popular app categories. We found that, alarmingly, 41% of ML apps do not protect their models at all, which can be trivially stolen from app packages. Even for those apps that use model protection or encryption, we were able to extract the models from 66% of them via unsophisticated dynamic analysis techniques. The extracted models are mostly commercial products and used for face recognition, liveness detection, ID/bank card recognition, and malware detection. We quantitatively estimated the potential financial and security impact of a leaked model, which can amount to millions of dollars for different stakeholders. Our study reveals that on-device models are currently at high risk of being leaked; attackers are highly motivated to steal such models. Drawn from our large-scale study, we report our insights into this emerging security problem and discuss the technical challenges, hoping to inspire future research on robust and practical model protection for mobile devices.
CRFeb 7, 2018
A Praise for Defensive Programming: Leveraging Uncertainty for Effective Malware MitigationRuimin Sun, Marcus Botacin, Nikolaos Sapountzis et al.
A promising avenue for improving the effectiveness of behavioral-based malware detectors would be to combine fast traditional machine learning detectors with high-accuracy, but time-consuming deep learning models. The main idea would be to place software receiving borderline classifications by traditional machine learning methods in an environment where uncertainty is added, while software is analyzed by more time-consuming deep learning models. The goal of uncertainty would be to rate-limit actions of potential malware during the time consuming deep analysis. In this paper, we present a detailed description of the analysis and implementation of CHAMELEON, a framework for realizing this uncertain environment for Linux. CHAMELEON offers two environments for software: (i) standard - for any software identified as benign by conventional machine learning methods and (ii) uncertain - for software receiving borderline classifications when analyzed by these conventional machine learning methods. The uncertain environment adds obstacles to software execution through random perturbations applied probabilistically on selected system calls. We evaluated CHAMELEON with 113 applications and 100 malware samples for Linux. Our results showed that at threshold 10%, intrusive and non-intrusive strategies caused approximately 65% of malware to fail accomplishing their tasks, while approximately 30% of the analyzed benign software to meet with various levels of disruption. With a dynamic, per-system call threshold, CHAMELEON caused 92% of the malware to fail, and only 10% of the benign software to be disrupted. We also found that I/O-bound software was three times more affected by uncertainty than CPU-bound software. Further, we analyzed the logs of software crashed with non-intrusive strategies, and found that some crashes are due to the software bugs.
CRDec 4, 2017
Learning Fast and Slow: PROPEDEUTICA for Real-time Malware DetectionRuimin Sun, Xiaoyong Yuan, Pan He et al.
Existing malware detectors on safety-critical devices have difficulties in runtime detection due to the performance overhead. In this paper, we introduce PROPEDEUTICA, a framework for efficient and effective real-time malware detection, leveraging the best of conventional machine learning (ML) and deep learning (DL) techniques. In PROPEDEUTICA, all software start execution are considered as benign and monitored by a conventional ML classifier for fast detection. If the software receives a borderline classification from the ML detector (e.g. the software is 50% likely to be benign and 50% likely to be malicious), the software will be transferred to a more accurate, yet performance demanding DL detector. To address spatial-temporal dynamics and software execution heterogeneity, we introduce a novel DL architecture (DEEPMALWARE) for PROPEDEUTICA with multi-stream inputs. We evaluated PROPEDEUTICA with 9,115 malware samples and 1,338 benign software from various categories for the Windows OS. With a borderline interval of [30%-70%], PROPEDEUTICA achieves an accuracy of 94.34% and a false-positive rate of 8.75%, with 41.45% of the samples moved for DEEPMALWARE analysis. Even using only CPU, PROPEDEUTICA can detect malware within less than 0.1 seconds.