Chip Hong Chang

CR
h-index12
5papers
266citations
Novelty64%
AI Score54

5 Papers

CRAug 2, 2023
Mercury: An Automated Remote Side-channel Attack to Nvidia Deep Learning Accelerator

Xiaobei Yan, Xiaoxuan Lou, Guowen Xu et al.

DNN accelerators have been widely deployed in many scenarios to speed up the inference process and reduce the energy consumption. One big concern about the usage of the accelerators is the confidentiality of the deployed models: model inference execution on the accelerators could leak side-channel information, which enables an adversary to preciously recover the model details. Such model extraction attacks can not only compromise the intellectual property of DNN models, but also facilitate some adversarial attacks. Although previous works have demonstrated a number of side-channel techniques to extract models from DNN accelerators, they are not practical for two reasons. (1) They only target simplified accelerator implementations, which have limited practicality in the real world. (2) They require heavy human analysis and domain knowledge. To overcome these limitations, this paper presents Mercury, the first automated remote side-channel attack against the off-the-shelf Nvidia DNN accelerator. The key insight of Mercury is to model the side-channel extraction process as a sequence-to-sequence problem. The adversary can leverage a time-to-digital converter (TDC) to remotely collect the power trace of the target model's inference. Then he uses a learning model to automatically recover the architecture details of the victim model from the power trace without any prior knowledge. The adversary can further use the attention mechanism to localize the leakage points that contribute most to the attack. Evaluation results indicate that Mercury can keep the error rate of model extraction below 1%.

CVMar 25
High-Fidelity Face Content Recovery via Tamper-Resilient Versatile Watermarking

Peipeng Yu, Jinfeng Xie, Chengfu Ou et al.

The proliferation of AIGC-driven face manipulation and deepfakes poses severe threats to media provenance, integrity, and copyright protection. Prior versatile watermarking systems typically rely on embedding explicit localization payloads, which introduces a fidelity--functionality trade-off: larger localization signals degrade visual quality and often reduce decoding robustness under strong generative edits. Moreover, existing methods rarely support content recovery, limiting their forensic value when original evidence must be reconstructed. To address these challenges, we present VeriFi, a versatile watermarking framework that unifies copyright protection, pixel-level manipulation localization, and high-fidelity face content recovery. VeriFi makes three key contributions: (1) it embeds a compact semantic latent watermark that serves as an content-preserving prior, enabling faithful restoration even after severe manipulations; (2) it achieves fine-grained localization without embedding localization-specific artifacts by correlating image features with decoded provenance signals; and (3) it introduces an AIGC attack simulator that combines latent-space mixing with seamless blending to improve robustness to realistic deepfake pipelines. Extensive experiments on CelebA-HQ and FFHQ show that VeriFi consistently outperforms strong baselines in watermark robustness, localization accuracy, and recovery quality, providing a practical and verifiable defense for deepfake forensics.

CLMay 19, 2023Code
Zero-Shot Text Classification via Self-Supervised Tuning

Chaoqun Liu, Wenxuan Zhang, Guizhen Chen et al.

Existing solutions to zero-shot text classification either conduct prompting with pre-trained language models, which is sensitive to the choices of templates, or rely on large-scale annotated data of relevant tasks for meta-tuning. In this work, we propose a new paradigm based on self-supervised learning to solve zero-shot text classification tasks by tuning the language models with unlabeled data, called self-supervised tuning. By exploring the inherent structure of free texts, we propose a new learning objective called first sentence prediction to bridge the gap between unlabeled data and text classification tasks. After tuning the model to learn to predict the first sentence in a paragraph based on the rest, the model is able to conduct zero-shot inference on unseen tasks such as topic classification and sentiment analysis. Experimental results show that our model outperforms the state-of-the-art baselines on 7 out of 10 tasks. Moreover, the analysis reveals that our model is less sensitive to the prompt design. Our code and pre-trained models are publicly available at https://github.com/DAMO-NLP-SG/SSTuning .

CRMay 8
A Unified Open-Set Framework for Scalable PUF-Based Authentication of Heterogeneous IoT Devices

Xin Wang, Peichun Hua, Chip Hong Chang et al.

As modern cyber systems scale to include large populations of heterogeneous IoT devices, securing them against impersonation and forgery is a critical cybersecurity challenge. Physical Unclonable Functions (PUFs) offer a lightweight, hardware-rooted trust anchor for IoT security. However, different PUF architectures possess distinct challenge-response spaces and raw response reliabilities, making existing authentication protocols PUF-type specific. To bridge this interoperability bottleneck, this paper proposes a scalable, helper-data-free, open-set PUF authentication framework that leverages an OpenGAN-based classifier to manage heterogeneous fleets of IoT devices. Our method addresses the limitations of traditional database-centric and digital-twin modeling methods by encoding raw responses from diverse PUF types, including strong, weak and hybrid PUFs, into a unified image representation. This enables robust, single-pass classification and impostor rejection. We integrate the classifier into a generic protocol employing hybrid encryption and Bloom filter-based replay detection. Evaluated across four different types of noisy PUF data (Arbiter, SRAM, DRAM, and heterogeneous PUFs), our framework achieves 100% closed-set accuracy and near-zero open-set error rates with up to 45 devices, a significant improvement over the 3 to 5 devices in prior classification-based approaches. Prototyped on a Raspberry Pi, our framework completes one authentication cycle within 0.67 s, approximately 30x faster than the state-of-the-art open-set baselines.

CVMar 19, 2025
Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection

Peipeng Yu, Jianwei Fei, Hui Gao et al.

Current Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in understanding multimodal data, but their potential remains underexplored for deepfake detection due to the misalignment of their knowledge and forensics patterns. To this end, we present a novel framework that unlocks LVLMs' potential capabilities for deepfake detection. Our framework includes a Knowledge-guided Forgery Detector (KFD), a Forgery Prompt Learner (FPL), and a Large Language Model (LLM). The KFD is used to calculate correlations between image features and pristine/deepfake image description embeddings, enabling forgery classification and localization. The outputs of the KFD are subsequently processed by the Forgery Prompt Learner to construct fine-grained forgery prompt embeddings. These embeddings, along with visual and question prompt embeddings, are fed into the LLM to generate textual detection responses. Extensive experiments on multiple benchmarks, including FF++, CDF2, DFD, DFDCP, DFDC, and DF40, demonstrate that our scheme surpasses state-of-the-art methods in generalization performance, while also supporting multi-turn dialogue capabilities.