AISep 14, 2023
Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and ChallengesFei Dou, Jin Ye, Geng Yuan et al.
Artificial General Intelligence (AGI), possessing the capacity to comprehend, learn, and execute tasks with human cognitive abilities, engenders significant anticipation and intrigue across scientific, commercial, and societal arenas. This fascination extends particularly to the Internet of Things (IoT), a landscape characterized by the interconnection of countless devices, sensors, and systems, collectively gathering and sharing data to enable intelligent decision-making and automation. This research embarks on an exploration of the opportunities and challenges towards achieving AGI in the context of the IoT. Specifically, it starts by outlining the fundamental principles of IoT and the critical role of Artificial Intelligence (AI) in IoT systems. Subsequently, it delves into AGI fundamentals, culminating in the formulation of a conceptual framework for AGI's seamless integration within IoT. The application spectrum for AGI-infused IoT is broad, encompassing domains ranging from smart grids, residential environments, manufacturing, and transportation to environmental monitoring, agriculture, healthcare, and education. However, adapting AGI to resource-constrained IoT settings necessitates dedicated research efforts. Furthermore, the paper addresses constraints imposed by limited computing resources, intricacies associated with large-scale IoT communication, as well as the critical concerns pertaining to security and privacy.
33.2CRMay 16
Stop Starving or Stuffing Me: Boosting Firmware Fuzzing Efficiency with On-demand Input DeliveryShandian Shen, Wei Zhou, Keming Zhao et al.
Firmware fuzzing has gained attention for identifying firmware bugs. However, current approaches often directly integrate fuzzing tools for general software. General software receives input as it encounters I/O functions, but firmware input can be received asynchronously and independently of the firmware's execution, with uncertain timing and quantity. Without full awareness of firmware's exceptions, existing solutions often imprudently deliver fuzzer-generated input to the firmware in an ad-hoc way. This either overwhelms the processing function of the firmware (stuffing) or fails to deliver enough input data to trigger input processing functions (starving). In both cases, fuzzing capability is weakened. In this paper, we comprehensively investigate the input delivery issue. To determine the optimal timing and quantity for delivering test cases, we leverage the fact that firmware has to check input availability before using data. So we employ static and dynamic analysis to map each input processing route into three stages: input retrieval, availability check, and processing. This recovered semantic information allows the fuzzer to accurately deliver input at the availability check points within the expected length range. For multiple input routes problem, we also optimize the scheduling algorithm to reach more diverse routes. Our prototype, named FIDO, can serve as an add-on to existing firmware fuzzers to enhance their test-case delivery effectiveness. Compared to ad-hoc input delivery methods used in Fuzzware and MULTIFUZZ, FIDO increases their median code coverage by up to 115% and 54%, respectively. Compared to SEmu, which requires humans to manually specify input delivery points, FIDO still improves its coverage by up to 19%. As a result, FIDO discovers known bugs significantly faster and also identifies five previously unknown bugs.
ROFeb 9, 2024
LLMs for Coding and Robotics EducationPeng Shu, Huaqin Zhao, Hanqi Jiang et al.
Large language models and multimodal large language models have revolutionized artificial intelligence recently. An increasing number of regions are now embracing these advanced technologies. Within this context, robot coding education is garnering increasing attention. To teach young children how to code and compete in robot challenges, large language models are being utilized for robot code explanation, generation, and modification. In this paper, we highlight an important trend in robot coding education. We test several mainstream large language models on both traditional coding tasks and the more challenging task of robot code generation, which includes block diagrams. Our results show that GPT-4V outperforms other models in all of our tests but struggles with generating block diagram images.
CROct 11, 2025
MetaBreak: Jailbreaking Online LLM Services via Special Token ManipulationWentian Zhu, Zhen Xiang, Wei Niu et al.
Unlike regular tokens derived from existing text corpora, special tokens are artificially created to annotate structured conversations during the fine-tuning process of Large Language Models (LLMs). Serving as metadata of training data, these tokens play a crucial role in instructing LLMs to generate coherent and context-aware responses. We demonstrate that special tokens can be exploited to construct four attack primitives, with which malicious users can reliably bypass the internal safety alignment of online LLM services and circumvent state-of-the-art (SOTA) external content moderation systems simultaneously. Moreover, we found that addressing this threat is challenging, as aggressive defense mechanisms-such as input sanitization by removing special tokens entirely, as suggested in academia-are less effective than anticipated. This is because such defense can be evaded when the special tokens are replaced by regular ones with high semantic similarity within the tokenizer's embedding space. We systemically evaluated our method, named MetaBreak, on both lab environment and commercial LLM platforms. Our approach achieves jailbreak rates comparable to SOTA prompt-engineering-based solutions when no content moderation is deployed. However, when there is content moderation, MetaBreak outperforms SOTA solutions PAP and GPTFuzzer by 11.6% and 34.8%, respectively. Finally, since MetaBreak employs a fundamentally different strategy from prompt engineering, the two approaches can work synergistically. Notably, empowering MetaBreak on PAP and GPTFuzzer boosts jailbreak rates by 24.3% and 20.2%, respectively.
CRFeb 7, 2022
$μ$AFL: Non-intrusive Feedback-driven Fuzzing for Microcontroller FirmwareWenqiang Li, Jiameng Shi, Fengjun Li et al.
Fuzzing is one of the most effective approaches to finding software flaws. However, applying it to microcontroller firmware incurs many challenges. For example, rehosting-based solutions cannot accurately model peripheral behaviors and thus cannot be used to fuzz the corresponding driver code. In this work, we present $μ$AFL, a hardware-in-the-loop approach to fuzzing microcontroller firmware. It leverages debugging tools in existing embedded system development to construct an AFL-compatible fuzzing framework. Specifically, we use the debug dongle to bridge the fuzzing environment on the PC and the target firmware on the microcontroller device. To collect code coverage information without costly code instrumentation, $μ$AFL relies on the ARM ETM hardware debugging feature, which transparently collects the instruction trace and streams the results to the PC. However, the raw ETM data is obscure and needs enormous computing resources to recover the actual instruction flow. We therefore propose an alternative representation of code coverage, which retains the same path sensitivity as the original AFL algorithm, but can directly work on the raw ETM data without matching them with disassembled instructions. To further reduce the workload, we use the DWT hardware feature to selectively collect runtime information of interest. We evaluated $μ$AFL on two real evaluation boards from two major vendors: NXP and STMicroelectronics. With our prototype, we discovered ten zero-day bugs in the driver code shipped with the SDK of STMicroelectronics and three zero-day bugs in the SDK of NXP. Eight CVEs have been allocated for them. Considering the wide adoption of vendor SDKs in real products, our results are alarming.
CRSep 24, 2021
Finding Taint-Style Vulnerabilities in Linux-based Embedded Firmware with SSE-based Alias AnalysisKai Cheng, Tao Liu, Le Guan et al.
Although the importance of using static analysis to detect taint-style vulnerabilities in Linux-based embedded firmware is widely recognized, existing approaches are plagued by three major limitations. (a) Approaches based on symbolic execution may miss alias information and therefore suffer from a high false-negative rate. (b) Approaches based on VSA (value set analysis) often provide an over-approximate pointer range. As a result, many false positives could be produced. (c) Existing work for detecting taint-style vulnerability does not consider indirect call resolution, whereas indirect calls are frequently used in Internet-facing embedded devices. As a result, many false negatives could be produced. In this work, we propose a precise demand-driven flow-, context- and field-sensitive alias analysis approach. Based on this new approach, we also design a novel indirect call resolution scheme. Combined with sanitization rule checking, our solution discovers taint-style vulnerabilities by static taint analysis. We implemented our idea with a prototype called EmTaint and evaluated it against 35 real-world embedded firmware samples from six popular vendors. EmTaint discovered at least 192 bugs, including 41 n-day bugs and 151 0-day bugs. At least 115 CVE/PSV numbers have been allocated from a subset of the reported vulnerabilities at the time of writing. Compared to state-of-the-art tools such as KARONTE and SaTC, EmTaint found significantly more bugs on the same dataset in less time.
CRJul 16, 2021
Automatic Firmware Emulation through Invalidity-guided Knowledge Inference (Extended Version)Wei Zhou, Le Guan, Peng Liu et al.
Emulating firmware for microcontrollers is challenging due to the tight coupling between the hardware and firmware. This has greatly impeded the application of dynamic analysis tools to firmware analysis. The state-of-the-art work automatically models unknown peripherals by observing their access patterns, and then leverages heuristics to calculate the appropriate responses when unknown peripheral registers are accessed. However, we empirically found that this approach and the corresponding heuristics are frequently insufficient to emulate firmware. In this work, we propose a new approach called uEmu to emulate firmware with unknown peripherals. Unlike existing work that attempts to build a general model for each peripheral, our approach learns how to correctly emulate firmware execution at individual peripheral access points. It takes the image as input and symbolically executes it by representing unknown peripheral registers as symbols. During symbolic execution, it infers the rules to respond to unknown peripheral accesses. These rules are stored in a knowledge base, which is referred to during the dynamic firmware analysis. uEmu achieved a passing rate of 95% in a set of unit tests for peripheral drivers without any manual assistance. We also evaluated uEmu with real-world firmware samples and new bugs were discovered.
PLJul 4, 2021
From Library Portability to Para-rehosting: Natively Executing Microcontroller Software on Commodity HardwareWenqiang Li, Le Guan, Jingqiang Lin et al.
Finding bugs in microcontroller (MCU) firmware is challenging, even for device manufacturers who own the source code. The MCU runs different instruction sets than x86 and exposes a very different development environment. This invalidates many existing sophisticated software testing tools on x86. To maintain a unified developing and testing environment, a straightforward way is to re-compile the source code into the native executable for a commodity machine (called rehosting). However, ad-hoc re-hosting is a daunting and tedious task and subject to many issues (library-dependence, kernel-dependence and hardware-dependence). In this work, we systematically explore the portability problem of MCU software and propose pararehosting to ease the porting process. Specifically, we abstract and implement a portable MCU (PMCU) using the POSIX interface. It models common functions of the MCU cores. For peripheral specific logic, we propose HAL-based peripheral function replacement, in which high-level hardware functions are replaced with an equivalent backend driver on the host. These backend drivers are invoked by well-designed para-APIs and can be reused across many MCU OSs. We categorize common HAL functions into four types and implement templates for quick backend development. Using the proposed approach, we have successfully rehosted nine MCU OSs including the widely deployed Amazon FreeRTOS, ARM Mbed OS, Zephyr and LiteOS. To demonstrate the superiority of our approach in terms of security testing, we used off-the-shelf dynamic analysis tools (AFL and ASAN) against the rehosted programs and discovered 28 previously-unknown bugs, among which 5 were confirmed by CVE and the other 19 were confirmed by vendors at the time of writing.
CRDec 31, 2019
Logic Bugs in IoT Platforms and Systems: A ReviewWei Zhou, Chen Cao, Dongdong Huo et al.
In recent years, IoT platforms and systems have been rapidly emerging. Although IoT is a new technology, new does not mean simpler (than existing networked systems). Contrarily, the complexity (of IoT platforms and systems) is actually being increased in terms of the interactions between the physical world and cyberspace. The increased complexity indeed results in new vulnerabilities. This paper seeks to provide a review of the recently discovered logic bugs that are specific to IoT platforms and systems. In particular, 17 logic bugs and one weakness falling into seven categories of vulnerabilities are reviewed in this survey.
CRAug 9, 2019
Good Motive but Bad Design: Why ARM MPU Has Become an Outcast in Embedded SystemsWei Zhou, Le Guan, Peng Liu et al.
As more and more embedded devices are connected to the Internet, leading to the emergence of Internet-of-Things (IoT), previously less tested (and insecure) devices are exposed to miscreants. To prevent them from being compromised, the memory protection unit (MPU), which is readily available on many devices, has the potential to become a free lunch for the defenders. To our surprise, the MPU is seldom used by real-world products. The reasons are multi-fold. While there are non-technical reasons such as compatibility issues, more importantly, we found that MPU brings virtually no security enhancement at the expense of decreased performance and responsiveness. In this work, we investigate the MPU adoption in major real-time operating systems (RTOSs), in particular, the FreeRTOS, and try to pinpoint the fundamental reasons to explain why MPU is not favored. We hope our findings can inspire new remedial solutions to change the situation. We also review the latest MPU design and provide technical suggestions to build more secure embedded systems.
CRNov 8, 2018
Discovering and Understanding the Security Hazards in the Interactions between IoT Devices, Mobile Apps, and Clouds on Smart Home PlatformsWei Zhou, Yan Jia, Yao Yao et al.
A smart home connects tens of home devices to the Internet, where an IoT cloud runs various home automation applications. While bringing unprecedented convenience and accessibility, it also introduces various security hazards to users. Prior research studied smart home security from several aspects. However, we found that the complexity of the interactions among the participating entities (i.e., devices, IoT clouds, and mobile apps) has not yet been systematically investigated. In this work, we conducted an in-depth analysis of five widely-used smart home platforms. Combining firmware analysis, network traffic interception, and blackbox testing, we reverse-engineered the details of the interactions among the participating entities. Based on the details, we inferred three legitimate state transition diagrams for the three entities, respectively. Using these state machines as a reference model, we identified a set of unexpected state transitions. To confirm and trigger the unexpected state transitions, we implemented a set of phantom devices to mimic a real device. By instructing the phantom devices to intervene in the normal entity-entity interactions, we have discovered several new vulnerabilities and a spectrum of attacks against real-world smart home platforms.
CRJun 19, 2017
Hey, you, keep away from my device: remotely implanting a virus expeller to defeat Mirai on IoT devicesChen Cao, Le Guan, Peng Liu et al.
Mirai is botnet which targets out-of-date Internet-of-Things (IoT) devices. The disruptive Distributed Denial of Service (DDoS) attack last year has hit major Internet companies, causing intermittent service for millions of Internet users. Since the affected devices typically do not support firmware update, it becomes challenging to expel these vulnerable devices in the wild. Both industry and academia have made great efforts in amending the situation. However, none of these efforts is simple to deploy, and at the same time effective in solving the problem. In this work, we design a collaborative defense strategy to tackle Mirai. Our key idea is to take advantage of human involvement in the least aggressive way. In particular, at a negotiated time slot, a customer is required to reboot the compromised device, then a "white" Mirai operated by the manufacturer breaks into the clean-state IoT devices immediately. The "white" Mirai expels other malicious Mirai variants, blocks vulnerable ports, and keeps a heart-beat connection with the server operated by the manufacturer. Once the heart-beat is lost, the server re-implants the "white" Mirai instantly. We have implemented a full prototype of the designed system, and the results show that our system can evade Mirai attacks effectively.
CRApr 19, 2017
TrustShadow: Secure Execution of Unmodified Applications with ARM TrustZoneLe Guan, Peng Liu, Xinyu Xing et al.
The rapid evolution of Internet-of-Things (IoT) technologies has led to an emerging need to make it smarter. A variety of applications now run simultaneously on an ARM-based processor. For example, devices on the edge of the Internet are provided with higher horsepower to be entrusted with storing, processing and analyzing data collected from IoT devices. This significantly improves efficiency and reduces the amount of data that needs to be transported to the cloud for data processing, analysis and storage. However, commodity OSes are prone to compromise. Once they are exploited, attackers can access the data on these devices. Since the data stored and processed on the devices can be sensitive, left untackled, this is particularly disconcerting. In this paper, we propose a new system, TrustShadow that shields legacy applications from untrusted OSes. TrustShadow takes advantage of ARM TrustZone technology and partitions resources into the secure and normal worlds. In the secure world, TrustShadow constructs a trusted execution environment for security-critical applications. This trusted environment is maintained by a lightweight runtime system that coordinates the communication between applications and the ordinary OS running in the normal world. The runtime system does not provide system services itself. Rather, it forwards requests for system services to the ordinary OS, and verifies the correctness of the responses. To demonstrate the efficiency of this design, we prototyped TrustShadow on a real chip board with ARM TrustZone support, and evaluated its performance using both microbenchmarks and real-world applications. We showed TrustShadow introduces only negligible overhead to real-world applications.
CRSep 8, 2016
From Physical to Cyber: Escalating Protection for Personalized Auto InsuranceLe Guan, Jun Xu, Shuai Wang et al.
Nowadays, auto insurance companies set personalized insurance rate based on data gathered directly from their customers' cars. In this paper, we show such a personalized insurance mechanism -- wildly adopted by many auto insurance companies -- is vulnerable to exploit. In particular, we demonstrate that an adversary can leverage off-the-shelf hardware to manipulate the data to the device that collects drivers' habits for insurance rate customization and obtain a fraudulent insurance discount. In response to this type of attack, we also propose a defense mechanism that escalates the protection for insurers' data collection. The main idea of this mechanism is to augment the insurer's data collection device with the ability to gather unforgeable data acquired from the physical world, and then leverage these data to identify manipulated data points. Our defense mechanism leveraged a statistical model built on unmanipulated data and is robust to manipulation methods that are not foreseen previously. We have implemented this defense mechanism as a proof-of-concept prototype and tested its effectiveness in the real world. Our evaluation shows that our defense mechanism exhibits a false positive rate of 0.032 and a false negative rate of 0.013.