Yang Su

CR
h-index49
24papers
10,072citations
Novelty46%
AI Score58

24 Papers

NIMar 18, 2023
Energy-Efficient Cellular-Connected UAV Swarm Control Optimization

Yang Su, Hui Zhou, Yansha Deng et al.

Cellular-connected unmanned aerial vehicle (UAV) swarm is a promising solution for diverse applications, including cargo delivery and traffic control. However, it is still challenging to communicate with and control the UAV swarm with high reliability, low latency, and high energy efficiency. In this paper, we propose a two-phase command and control (C&C) transmission scheme in a cellular-connected UAV swarm network, where the ground base station (GBS) broadcasts the common C&C message in Phase I. In Phase II, the UAVs that have successfully decoded the C&C message will relay the message to the rest of UAVs via device-to-device (D2D) communications in either broadcast or unicast mode, under latency and energy constraints. To maximize the number of UAVs that receive the message successfully within the latency and energy constraints, we formulate the problem as a Constrained Markov Decision Process to find the optimal policy. To address this problem, we propose a decentralized constrained graph attention multi-agent Deep-Q-network (DCGA-MADQN) algorithm based on Lagrangian primal-dual policy optimization, where a PID-controller algorithm is utilized to update the Lagrange Multiplier. Simulation results show that our algorithm could maximize the number of UAVs that successfully receive the common C&C under energy constraints.

SYApr 26, 2022
Interpretable Battery Cycle Life Range Prediction Using Early Degradation Data at Cell Level

Huang Zhang, Yang Su, Faisal Altaf et al.

Battery cycle life prediction using early degradation data has many potential applications throughout the battery product life cycle. For that reason, various data-driven methods have been proposed for point prediction of battery cycle life with minimum knowledge of the battery degradation mechanisms. However, managing the rapidly increasing amounts of batteries at end-of-life with lower economic and technical risk requires prediction of cycle life with quantified uncertainty, which is still lacking. The interpretability (i.e., the reason for high prediction accuracy) of these advanced data-driven methods is also worthy of investigation. Here, a Quantile Regression Forest (QRF) model, having the advantage of not assuming any specific distribution of cycle life, is introduced to make cycle life range prediction with uncertainty quantified as the width of the prediction interval, in addition to point predictions with high accuracy. The hyperparameters of the QRF model are optimized with a proposed alpha-logistic-weighted criterion so that the coverage probabilities associated with the prediction intervals are calibrated. The interpretability of the final QRF model is explored with two global model-agnostic methods, namely permutation importance and partial dependence plot.

AIJan 26Code
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

Yinger Zhang, Shutong Jiang, Renhao Li et al.

While agent evaluation has shifted toward long-horizon tasks, most benchmarks still emphasize local, step-level reasoning rather than the global constrained optimization (e.g., time and financial budgets) that demands genuine planning ability. Meanwhile, existing LLM planning benchmarks underrepresent the active information gathering and fine-grained local constraints typical of real-world settings. To address this, we introduce DeepPlanning, a challenging benchmark for practical long-horizon agent planning. It features multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization. Evaluations on DeepPlanning show that even frontier agentic LLMs struggle with these problems, highlighting the importance of reliable explicit reasoning patterns and parallel tool use for achieving better effectiveness-efficiency trade-offs. Error analysis further points to promising directions for improving agentic LLMs over long planning horizons. We open-source the code and data to support future research.

45.3CLApr 16
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation

Xiaomeng Hu, Yinger Zhang, Fei Huang et al.

AI agents are expected to perform professional work across hundreds of occupational domains (from emergency department triage to nuclear reactor safety monitoring to customs import processing), yet existing benchmarks can only evaluate agents in the few domains where public environments exist. We introduce OccuBench, a benchmark covering 100 real-world professional task scenarios across 10 industry categories and 65 specialized domains, enabled by Language Environment Simulators (LESs) that simulate domain-specific environments through LLM-driven tool response generation. Our multi-agent synthesis pipeline automatically produces evaluation instances with guaranteed solvability, calibrated difficulty, and document-grounded diversity. OccuBench evaluates agents along two complementary dimensions: task completion across professional domains and environmental robustness under controlled fault injection (explicit errors, implicit data degradation, and mixed faults). We evaluate 15 frontier models across 8 model families and find that: (1) no single model dominates all industries, as each has a distinct occupational capability profile; (2) implicit faults (truncated data, missing fields) are harder than both explicit errors (timeouts, 500s) and mixed faults, because they lack overt error signals and require the agent to independently detect data degradation; (3) larger models, newer generations, and higher reasoning effort consistently improve performance. GPT-5.2 improves by 27.5 points from minimal to maximum reasoning effort; and (4) strong agents are not necessarily strong environment simulators. Simulator quality is critical for LES-based evaluation reliability. OccuBench provides the first systematic cross-industry evaluation of AI agents on professional occupational tasks.

AIOct 30, 2025
One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning

Renhao Li, Jianhong Tu, Yang Su et al.

Reward models (RMs) play a critical role in aligning large language models (LLMs) with human preferences. Yet in the domain of tool learning, the lack of RMs specifically designed for function-calling tasks has limited progress toward more capable agentic AI. We introduce ToolRM, a family of lightweight generative RMs tailored for general tool-use scenarios. To build these models, we propose a novel pipeline that constructs pairwise preference data using rule-based scoring and multidimensional sampling. This yields ToolPref-Pairwise-30K, a diverse, balanced, and challenging dataset of critique tasks that supports reinforcement learning with verifiable feedback. To evaluate tool-use RMs, we also introduce TRBench$_{BFCL}$, a benchmark built on the agentic evaluation suite BFCL. Trained on our constructed data, models from the Qwen3-4B/8B series achieve up to 14.28% higher accuracy, substantially outperforming frontier models such as Claude 4 and OpenAI o3 in pairwise reward judgments. Beyond training objectives, ToolRM generalizes to broader critique tasks, including Best-of-N sampling and self-correction. Experiments on ACEBench highlight its effectiveness and efficiency, enabling inference-time scaling and reducing output token usage by over 66%. We release data and model checkpoints to facilitate future research.

CLMay 14, 2025
Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang et al. · tsinghua

In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework. This eliminates the need to switch between different models--such as chat-optimized models (e.g., GPT-4o) and dedicated reasoning models (e.g., QwQ-32B)--and enables dynamic mode switching based on user queries or chat templates. Meanwhile, Qwen3 introduces a thinking budget mechanism, allowing users to allocate computational resources adaptively during inference, thereby balancing latency and performance based on task complexity. Moreover, by leveraging the knowledge from the flagship models, we significantly reduce the computational resources required to build smaller-scale models, while ensuring their highly competitive performance. Empirical evaluations demonstrate that Qwen3 achieves state-of-the-art results across diverse benchmarks, including tasks in code generation, mathematical reasoning, agent tasks, etc., competitive against larger MoE models and proprietary models. Compared to its predecessor Qwen2.5, Qwen3 expands multilingual support from 29 to 119 languages and dialects, enhancing global accessibility through improved cross-lingual understanding and generation capabilities. To facilitate reproducibility and community-driven research and development, all Qwen3 models are publicly accessible under Apache 2.0.

CLDec 19, 2024
Qwen2.5 Technical Report

Qwen, An Yang, Baosong Yang et al.

In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight offerings include base and instruction-tuned models, with quantized versions available. In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus, both available from Alibaba Cloud Model Studio. Qwen2.5 has demonstrated top-tier performance on a wide range of benchmarks evaluating language understanding, reasoning, mathematics, coding, human preference alignment, etc. Specifically, the open-weight flagship Qwen2.5-72B-Instruct outperforms a number of open and proprietary models and demonstrates competitive performance to the state-of-the-art open-weight model, Llama-3-405B-Instruct, which is around 5 times larger. Qwen2.5-Turbo and Qwen2.5-Plus offer superior cost-effectiveness while performing competitively against GPT-4o-mini and GPT-4o respectively. Additionally, as the foundation, Qwen2.5 models have been instrumental in training specialized models such as Qwen2.5-Math, Qwen2.5-Coder, QwQ, and multimodal models.

CLJul 27, 2025Code
RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing

Hao Xiang, Tianyi Tang, Yang Su et al.

Recent advancements in Large Language Models (LLMs) have shown outstanding potential for role-playing applications. Evaluating these capabilities is becoming crucial yet remains challenging. Existing benchmarks mostly adopt a \textbf{character-centric} approach, simplify user-character interactions to isolated Q&A tasks, and fail to reflect real-world applications. To address this limitation, we introduce RMTBench, a comprehensive \textbf{user-centric} bilingual role-playing benchmark featuring 80 diverse characters and over 8,000 dialogue rounds. RMTBench includes custom characters with detailed backgrounds and abstract characters defined by simple traits, enabling evaluation across various user scenarios. Our benchmark constructs dialogues based on explicit user motivations rather than character descriptions, ensuring alignment with practical user applications. Furthermore, we construct an authentic multi-turn dialogue simulation mechanism. With carefully selected evaluation dimensions and LLM-based scoring, this mechanism captures the complex intention of conversations between the user and the character. By shifting focus from character background to user intention fulfillment, RMTBench bridges the gap between academic evaluation and practical deployment requirements, offering a more effective framework for assessing role-playing capabilities in LLMs. All code and datasets will be released soon. We release the datasets at https://huggingface.co/datasets/xiangh/RMTBENCH.

CRMar 19, 2021Code
Wisecr: Secure Simultaneous Code Disseminationto Many Batteryless Computational RFID Devices

Yang Su, Michael Chesser, Yansong Gao et al.

Emerging ultra-low-power tiny scale computing devices in Cyber-Physical Systems %and Internet of Things (IoT) run on harvested energy, are intermittently powered, have limited computational capability, and perform sensing and actuation functions under the control of a dedicated firmware operating without the supervisory control of an operating system. Wirelessly updating or patching the firmware of such devices is inevitable. We consider the challenging problem of simultaneous and secure firmware updates or patching for a typical class of such devices -- Computational Radio Frequency Identification (CRFID) devices. We propose Wisecr, the first secure and simultaneous wireless code dissemination mechanism to multiple devices that prevent malicious code injection attacks and intellectual property (IP) theft, whilst enabling remote attestation of code installation. Importantly, Wisecr is engineered to comply with existing ISO compliant communication protocol standards employed by CRFID devices and systems. We comprehensively evaluate Wisecr's overhead, demonstrate its implementation over standards-compliant protocols, analyze its security and implement an end-to-end realization with popular CRFID devices -- the open-source code is released on GitHub.

CLSep 29, 2023
Voice2Action: Language Models as Agent for Efficient Real-Time Interaction in Virtual Reality

Yang Su

Large Language Models (LLMs) are trained and aligned to follow natural language instructions with only a handful of examples, and they are prompted as task-driven autonomous agents to adapt to various sources of execution environments. However, deploying agent LLMs in virtual reality (VR) has been challenging due to the lack of efficiency in online interactions and the complex manipulation categories in 3D environments. In this work, we propose Voice2Action, a framework that hierarchically analyzes customized voice signals and textual commands through action and entity extraction and divides the execution tasks into canonical interaction subsets in real-time with error prevention from environment feedback. Experiment results in an urban engineering VR environment with synthetic instruction data show that Voice2Action can perform more efficiently and accurately than approaches without optimizations.

NIMar 6, 2025
Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences

Adnan Shahid, Adrian Kliks, Ahmed Al-Tahmeesschi et al.

This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced by modern telecom networks. The paper covers a wide range of topics, from the architecture and deployment strategies of LTMs to their applications in network management, resource allocation, and optimization. It also explores the regulatory, ethical, and standardization considerations for LTMs, offering insights into their future integration into telecom infrastructure. The goal is to provide a comprehensive roadmap for the adoption of LTMs to enhance scalability, performance, and user-centric innovation in telecom networks.

LGJan 8, 2025
Federated Fine-Tuning of LLMs: Framework Comparison and Research Directions

Na Yan, Yang Su, Yansha Deng et al.

Federated learning (FL) provides a privacy-preserving solution for fine-tuning pre-trained large language models (LLMs) using distributed private datasets, enabling task-specific adaptation while preserving data privacy. However, fine-tuning the extensive parameters in LLMs is particularly challenging in resource-constrained federated scenarios due to the significant communication and computational costs. To gain a deeper understanding of how these challenges can be addressed, this article conducts a comparative analysis three advanced federated LLM (FedLLM) frameworks that integrate knowledge distillation (KD) and split learning (SL) to mitigate these issues: 1) FedLLMs, where clients upload model parameters or gradients to enable straightforward and effective fine-tuning; 2) KD-FedLLMs, which leverage KD for efficient knowledge sharing via logits; and 3) Split-FedLLMs, which split the LLMs into two parts, with one part executed on the client and the other one on the server, to balance the computational load. Each framework is evaluated based on key performance metrics, including model accuracy, communication overhead, and client-side computational load, offering insights into their effectiveness for various federated fine-tuning scenarios. Through this analysis, we identify framework-specific optimization opportunities to enhance the efficiency of FedLLMs and discuss broader research directions, highlighting open opportunities to better adapt FedLLMs for real-world applications. A use case is presented to demonstrate the performance comparison of these three frameworks under varying configurations and settings.

MEFeb 20, 2024
Integrating Active Learning in Causal Inference with Interference: A Novel Approach in Online Experiments

Hongtao Zhu, Sizhe Zhang, Yang Su et al.

In the domain of causal inference research, the prevalent potential outcomes framework, notably the Rubin Causal Model (RCM), often overlooks individual interference and assumes independent treatment effects. This assumption, however, is frequently misaligned with the intricate realities of real-world scenarios, where interference is not merely a possibility but a common occurrence. Our research endeavors to address this discrepancy by focusing on the estimation of direct and spillover treatment effects under two assumptions: (1) network-based interference, where treatments on neighbors within connected networks affect one's outcomes, and (2) non-random treatment assignments influenced by confounders. To improve the efficiency of estimating potentially complex effects functions, we introduce an novel active learning approach: Active Learning in Causal Inference with Interference (ACI). This approach uses Gaussian process to flexibly model the direct and spillover treatment effects as a function of a continuous measure of neighbors' treatment assignment. The ACI framework sequentially identifies the experimental settings that demand further data. It further optimizes the treatment assignments under the network interference structure using genetic algorithms to achieve efficient learning outcome. By applying our method to simulation data and a Tencent game dataset, we demonstrate its feasibility in achieving accurate effects estimations with reduced data requirements. This ACI approach marks a significant advancement in the realm of data efficiency for causal inference, offering a robust and efficient alternative to traditional methodologies, particularly in scenarios characterized by complex interference patterns.

ROMar 9
Vector Field Augmented Differentiable Policy Learning for Vision-Based Drone Racing

Yang Su, Feng Yu, Yu Hu et al.

Autonomous drone racing in complex environments requires agile, high-speed flight while maintaining reliable obstacle avoidance. Differentiable-physics-based policy learning has recently demonstrated high sample efficiency and remarkable performance across various tasks, including agile drone flight and quadruped locomotion. However, applying such methods to drone racing remains difficult, as key objective like gate traversal are inherently hard to express as smooth, differentiable losses. To address these challenges, we propose DiffRacing, a novel vector field-augmented differentiable policy learning framework. DiffRacing integrates differentiable losses and vector fields into the training process to provide continuous and stable gradient signals, balancing obstacle avoidance and high-speed gate traversal. In addition, a differentiable Delta Action Model compensates for dynamics mismatch, enabling efficient sim-to-real transfer without explicit system identification. Extensive simulation and real-world experiments demonstrate that DiffRacing achieves superior sample efficiency, faster convergence, and robust flight performance, thereby demonstrating that vector fields can augment traditional gradient-based policy learning with a task-specific geometric prior.

LGSep 1, 2025
Communication-Aware Knowledge Distillation for Federated LLM Fine-Tuning over Wireless Networks

Xinlu Zhang, Na Yan, Yang Su et al.

Federated learning (FL) for large language models (LLMs) offers a privacy-preserving scheme, enabling clients to collaboratively fine-tune locally deployed LLMs or smaller language models (SLMs) without exchanging raw data. While parameter-sharing methods in traditional FL models solves number of technical challenges, they still incur high communication overhead and struggle with adapting to heterogeneous model architectures. Federated distillation, a framework for mutual knowledge transfer via shared logits, typically offers lower communication overhead than parameter-sharing methods. However, transmitting logits from LLMs remains challenging for bandwidth-limited clients due to their high dimensionality. In this work, we focus on a federated LLM distillation with efficient communication overhead. To achieve this, we first propose an adaptive Top-k logit selection mechanism, dynamically sparsifying logits according to real-time communication conditions. Then to tackle the dimensional inconsistency introduced by the adaptive sparsification, we design an adaptive logits aggregation scheme, effectively alleviating the artificial and uninformative inputs introduced by conventional zero-padding methods. Finally, to enhance the distillation effect, we incorporate LoRA-adapted hidden-layer projection from LLM into the distillation loss, reducing the communication overhead further while providing richer representation. Experimental results demonstrate that our scheme achieves superior performance compared to baseline methods while effectively reducing communication overhead by approximately 50%.

CVAug 27, 2025
SDiFL: Stable Diffusion-Driven Framework for Image Forgery Localization

Yang Su, Shunquan Tan, Jiwu Huang

Driven by the new generation of multi-modal large models, such as Stable Diffusion (SD), image manipulation technologies have advanced rapidly, posing significant challenges to image forensics. However, existing image forgery localization methods, which heavily rely on labor-intensive and costly annotated data, are struggling to keep pace with these emerging image manipulation technologies. To address these challenges, we are the first to integrate both image generation and powerful perceptual capabilities of SD into an image forensic framework, enabling more efficient and accurate forgery localization. First, we theoretically show that the multi-modal architecture of SD can be conditioned on forgery-related information, enabling the model to inherently output forgery localization results. Then, building on this foundation, we specifically leverage the multimodal framework of Stable DiffusionV3 (SD3) to enhance forgery localization performance.We leverage the multi-modal processing capabilities of SD3 in the latent space by treating image forgery residuals -- high-frequency signals extracted using specific highpass filters -- as an explicit modality. This modality is fused into the latent space during training to enhance forgery localization performance. Notably, our method fully preserves the latent features extracted by SD3, thereby retaining the rich semantic information of the input image. Experimental results show that our framework achieves up to 12% improvements in performance on widely used benchmarking datasets compared to current state-of-the-art image forgery localization models. Encouragingly, the model demonstrates strong performance on forensic tasks involving real-world document forgery images and natural scene forging images, even when such data were entirely unseen during training.

LGMay 13, 2025
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts

Yang Su, Na Yan, Yansha Deng et al.

Large language models (LLMs) hosted on cloud servers alleviate the computational and storage burdens on local devices but raise privacy concerns due to sensitive data transmission and require substantial communication bandwidth, which is challenging in constrained environments. In contrast, small language models (SLMs) running locally enhance privacy but suffer from limited performance on complex tasks. To balance computational cost, performance, and privacy protection under bandwidth constraints, we propose a privacy-aware wireless collaborative mixture of experts (PWC-MoE) framework. Specifically, PWC-MoE employs a sparse privacy-aware gating network to dynamically route sensitive tokens to privacy experts located on local clients, while non-sensitive tokens are routed to non-privacy experts located at the remote base station. To achieve computational efficiency, the gating network ensures that each token is dynamically routed to and processed by only one expert. To enhance scalability and prevent overloading of specific experts, we introduce a group-wise load-balancing mechanism for the gating network that evenly distributes sensitive tokens among privacy experts and non-sensitive tokens among non-privacy experts. To adapt to bandwidth constraints while preserving model performance, we propose a bandwidth-adaptive and importance-aware token offloading scheme. This scheme incorporates an importance predictor to evaluate the importance scores of non-sensitive tokens, prioritizing the most important tokens for transmission to the base station based on their predicted importance and the available bandwidth. Experiments demonstrate that the PWC-MoE framework effectively preserves privacy and maintains high performance even in bandwidth-constrained environments, offering a practical solution for deploying LLMs in privacy-sensitive and bandwidth-limited scenarios.

LGNov 10, 2024
HAFLQ: Heterogeneous Adaptive Federated LoRA Fine-tuned LLM with Quantization

Yang Su, Na Yan, Yansha Deng et al.

Federated fine-tuning of pre-trained Large Language Models (LLMs) enables task-specific adaptation across diverse datasets while preserving privacy. However, challenges such as high computational and memory demands, heterogeneous client resources, bandwidth constraints, and ineffective global aggregation hinder its efficiency. To address these issues, we propose HAFLQ (Heterogeneous Adaptive Federated Low-Rank Adaptation Fine-tuned LLM with Quantization), a novel framework for efficient and scalable federated fine-tuning of LLMs in heterogeneous environments. To reduce memory and computation demands, we propose a salience-driven adaptive LLM quantization framework that evaluates the importance of transformer blocks using a salience metric and applies adaptive block-wise quantization accordingly. To handle heterogeneous computational capabilities, we propose an importance-based parameter truncation and freezing scheme. To address communication bottlenecks, we propose an importance-aware bandwidth-adaptive quantization method, which dynamically adjusts parameter precision based on importance and bandwidth constraints. To improve global model aggregation, we propose an adaptive rank-1 matrix-level aggregation strategy, which prevents information dilution and accelerates convergence by aggregating only updated rank-1 matrices from clients. Experimental results on the text classification task demonstrate that HAFLQ reduces memory usage by 31%, lowers communication cost by 49%, improves accuracy by 50%, and achieves faster convergence compared to the baseline method.

CRJan 19, 2022
Leaving Your Things Unattended is No Joke! Memory Bus Snooping and Open Debug Interface Exploits

Yang Su, Damith C. Ranasinghe

Internet of Things devices are widely adopted by the general population. People today are more connected than ever before. The widespread use and low-cost driven construction of these devices in a competitive marketplace render Internet-connected devices an easier and attractive target for malicious actors. This paper demonstrates non-invasive physical attacks against IoT devices in two case studies in a tutorial style format. The study focuses on demonstrating the: i)exploitation of debug interfaces, often left open after manufacture; and ii)the exploitation of exposed memory buses. We illustrate a person could commit such attacks with entry-level knowledge, inexpensive equipment, and limited time (in 8 to 25 minutes).

CRSep 7, 2021
NoisFre: Noise-Tolerant Memory Fingerprints from Commodity Devices for Security Functions

Yansong Gao, Yang Su, Surya Nepal et al.

Building hardware security primitives with on-device memory fingerprints is a compelling proposition given the ubiquity of memory in electronic devices, especially for low-end Internet of Things devices for which cryptographic modules are often unavailable. However, the use of fingerprints in security functions is challenged by the small, but unpredictable variations in fingerprint reproductions from the same device due to measurement noise. Our study formulates a novel and pragmatic approach to achieve highly reliable fingerprints from device memories. We investigate the transformation of raw fingerprints into a noise-tolerant space where the generation of fingerprints is intrinsically highly reliable. We derive formal performance bounds to support practitioners to easily adopt our methods for applications. Subsequently, we demonstrate the expressive power of our formalization by using it to investigate the practicability of extracting noise-tolerant fingerprints from commodity devices. Together with extensive simulations, we have employed 119 chips from five different manufacturers for extensive experimental validations. Our results, including an end-to-end implementation demonstration with a low-cost wearable Bluetooth inertial sensor capable of on-demand and runtime key generation, show that key generators with failure rates less than $10^-6$ can be efficiently obtained with noise-tolerant fingerprints with a single fingerprint snapshot to support ease-of-enrollment.

CRFeb 8, 2019
Hash Functions and Benchmarks for Resource Constrained Passive Devices: A Preliminary Study

Yang Su, Yansong Gao, Omid Kavehei et al.

Recently, we have witnessed the emergence of intermittently powered computational devices, an early example is the Intel WISP (Wireless Identification and Sensing Platform). How we engineer basic security services to realize mutual authentication, confidentiality and preserve privacy of information collected, stored and transmitted by, and establish the veracity of measurements taken from, such devices remain an open challenge; especially for batteryless and intermittently powered devices. While the cryptographic community has significantly progressed lightweight (in terms of area overhead) security primitives for low cost and power efficient hardware implementations, lightweight software implementations of security primitives for resource constrained devices are less investigated. Especially, the problem of providing security for intermittently powered computational devices is unexplored. In this paper, we illustrate the unique challenges posed by an emerging class of intermittently powered and energy constrained computational IoT devices for engineering security solutions. We focus on the construction and evaluation of a basic hash primitive---both existing cryptographic hash functions and non-cryptographic hash functions built upon lightweight block ciphers. We provide software implementation benchmarks for eight primitives on a low power and resource limited computational device, and outline an execution model for these primitives under intermittent powering.

CRFeb 8, 2019
Building Secure SRAM PUF Key Generators on Resource Constrained Devices

Yansong Gao, Yang Su, Wei Yang et al.

A securely maintained key is the premise upon which data stored and transmitted by ubiquitously deployed resource limited devices, such as those in the Internet of Things (IoT), are protected. However, many of these devices lack a secure non-volatile memory (NVM) for storing keys because of cost constraints. Silicon physical unclonable functions (PUFs) offering unique device specific secrets to electronic commodities are a low-cost alternative to secure NVM. As a physical hardware security primitive, reliability of a PUF is affected by thermal noise and changes in environmental conditions; consequently, PUF responses cannot be directly employed as cryptographic keys. A fuzzy extractor can turn noisy PUF responses into usable cryptographic keys. However, a fuzzy extractor is not immediately mountable on (highly) resource constrained devices due to its implementation overhead. We present a methodology for constructing a lightweight and secure PUF key generator for resource limited devices. In particular, we focus on PUFs constructed from pervasively embedded SRAM in modern microcontroller units and use a batteryless computational radio frequency identification (CRFID) device as a representative resource constrained IoT device in a case study.

CRJul 27, 2018
SecuCode: Intrinsic PUF Entangled Secure Wireless Code Dissemination for Computational RFID Devices

Yang Su, Yansong Gao, Michael Chesser et al.

The simplicity of deployment and perpetual operation of energy harvesting devices provides a compelling proposition for a new class of edge devices for the Internet of Things. In particular, Computational Radio Frequency Identification (CRFID) devices are an emerging class of battery-free, computational, sensing enhanced devices that harvest all of their energy for operation. Despite wireless connectivity and powering, secure wireless firmware updates remains an open challenge for CRFID devices due to: intermittent powering, limited computational capabilities, and the absence of a supervisory operating system. We present, for the first time, a secure wireless code dissemination (SecuCode) mechanism for CRFIDs by entangling a device intrinsic hardware security primitive Static Random Access Memory Physical Unclonable Function (SRAM PUF) to a firmware update protocol. The design of SecuCode: i) overcomes the resource-constrained and intermittently powered nature of the CRFID devices; ii) is fully compatible with existing communication protocols employed by CRFID devices in particular, ISO-18000-6C protocol; and ii) is built upon a standard and industry compliant firmware compilation and update method realized by extending a recent framework for firmware updates provided by Texas Instruments. We build an end-to-end SecuCode implementation and conduct extensive experiments to demonstrate standards compliance, evaluate performance and security.

CRMay 19, 2018
Lightweight (Reverse) Fuzzy Extractor with Multiple Referenced PUF Responses

Yansong Gao, Yang Su, Lei Xu et al.

A Physical unclonable functions (PUF), alike a fingerprint, exploits manufacturing randomness to endow each physical item with a unique identifier. One primary PUF application is the secure derivation of volatile cryptographic keys using a fuzzy extractor comprising of two procedures: i) secure sketch; and ii) entropy extraction. Although the entropy extractor can be lightweight, the overhead of the secure sketch responsible correcting naturally noisy PUF responses is usually costly. We observe that, in general, response unreliability with respect to a enrolled reference measurement increases with increasing differences between the in-the-field PUF operating condition and the operating condition used in evaluating the enrolled reference response. For the first time, we exploit such an important but inadvertent observation. In contrast to the conventional single reference response enrollment, we propose enrolling multiple reference responses (MRR) subject to the same challenge but under multiple distinct operating conditions. The critical observation here is that one of the reference operating conditions is likely to be closer to the operating condition of the field deployed PUF, thus, resulting in minimizing the expected unreliability when compared to the single reference under the nominal condition. Overall, MRR greatly reduces the demand for the expected number of erroneous bits for correction and, subsequently, achieve a significant reduction in the error correction overhead. The significant implementation efficiency gains from the proposed MRR method is demonstrated from software implementations of fuzzy extractors on batteryless resource constraint computational radio frequency identification devices, where realistic PUF data is collected from the embedded intrinsic SRAM PUFs.