CRMay 30
Confused ChatGPT: Cross-App Context Poisoning via First-Party APIsChao Wang, Somesh Jha, Zhiqiang Lin
ChatGPT Apps, launched by OpenAI on Oct. 6, 2025, introduce an app-in-app paradigm in which third-party applications share a single chat context with the user and with every other connected app. The ecosystem grew from 122 apps in Dec. 2025 to 888 by May 2026, yet its security has remained uninvestigated. We identify cross-app context poisoning, a variant of indirect prompt injection distinguished by three properties: 1) the injection persists in the shared chat context across turns; 2) the effect surfaces through a different co-resident app the user later invokes; and 3) the delivery vectors are first-party APIs exposed to every connected app. We find multiple APIs capable of writing app-controlled content into the shared context, with sendFollowUpMessage as the most direct and potent channel. Two undocumented parameters that the runtime silently accepts, systemPrompt and isVisible, amplify this channel to silent, system-priority writes. Leveraging this channel, we realize a confused-deputy attack in which a malicious app poisons the context so that the LLM, consulting that context, enables manipulation against benign co-resident apps. We demonstrate two payload styles (conditional and imperative) and evaluate them across six current ChatGPT models. The root cause is architectural: the LLM's context is a persistent, flat, untagged data store shared by user and apps, with no isolation. Every mature multi-tenant platform, from Multics virtual memory to Android UIDs and iOS sandbox profiles, paid the isolation cost before admitting third parties; ChatGPT Apps did not. Fixing this requires an architectural change, not a patch. We disclosed our findings to OpenAI; the undocumented parameters remain accessible at the time of writing, and the architectural gap is by design: the shared context that enables cross-app composition is the same flat namespace that enables cross-app poisoning.
CRApr 17
Blueprint, Bootstrap, and Bridge: A Security Look at NVIDIA GPU Confidential ComputingZhongshu Gu, Enriquillo Valdez, Salman Ahmed et al. · ibm-research
NVIDIA GPU Confidential Computing (GPU-CC) aims to provide secure execution for AI workloads. For end users, enabling GPU-CC is seamless and requires no modifications to existing applications. However, this ease of adoption relies on a proprietary and highly complex system that is difficult to inspect, creating challenges for researchers seeking to understand its architecture and security landscape. In this work, we provide a security look at GPU-CC by reconstructing a coherent view of the system. We first examine the system's blueprint, focusing on the specialized architectural engines that support its security mechanisms. We then analyze the bootstrap process, which coordinates hardware and software components to establish these protections. Finally, we conduct targeted experiments to assess whether, under the GPU-CC threat model, data transfers along different paths remain protected across the bridge between trusted CPU and GPU domains. We responsibly disclosed all security findings presented in this paper to the NVIDIA Product Security Incident Response Team (PSIRT).
CLMay 9Code
Source or It Didn't Happen: A Multi-Agent Framework for Citation Hallucination DetectionMingzhe Li, Zhiqiang Lin, Shiqing Ma
Large language models are increasingly used in scientific writing, yet they can fabricate citation-shaped references that appear plausible but fail bibliographic verification. Existing detectors often reduce verification to binary found/not-found decisions and rely on brittle parsing or incomplete retrieval, offering little field-level signal to auditors. We reframe citation hallucination detection as taxonomy-aligned field-level adjudication and introduce a 12-code taxonomy spanning Real, Potential, and Hallucinated citations. Based on this taxonomy, we build CiteTracer, a cascading multi-agent detector that extracts structured citations from PDF and BibTeX, retrieves evidence through cache lookup, URL fetch, scholar connectors, and web search, applies deterministic field matching, and routes ambiguous cases to class-specialist judgers. We release a benchmark of 2,450 synthetic citations built from real seeds with controlled LLM mutations, paired with 957 real-world fabricated citations drawn from ICLR 2026 and an anonymous conference desk-rejected submissions. CiteTracer reaches 97.1% accuracy on the synthetic benchmark, with class-level F1 scores of 97.0, 95.8, and 98.5 for Real, Potential, and Hallucinated, respectively, and detects 97.1% of fabrications on the real-world set without abstaining. Code: https://github.com/aaFrostnova/CiteTracer.
AIMar 11
A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging ParadigmsKejin Yu, Yuhan Sun, Taiqiang Wu et al.
The development of high-level autonomous driving (AD) is shifting from perception-centric limitations to a more fundamental bottleneck, namely, a deficit in robust and generalizable reasoning. Although current AD systems manage structured environments, they consistently falter in long-tail scenarios and complex social interactions that require human-like judgment. Meanwhile, the advent of large language and multimodal models (LLMs and MLLMs) presents a transformative opportunity to integrate a powerful cognitive engine into AD systems, moving beyond pattern matching toward genuine comprehension. However, a systematic framework to guide this integration is critically lacking. To bridge this gap, we provide a comprehensive review of this emerging field and argue that reasoning should be elevated from a modular component to the system's cognitive core. Specifically, we first propose a novel Cognitive Hierarchy to decompose the monolithic driving task according to its cognitive and interactive complexity. Building on this, we further derive and systematize seven core reasoning challenges, such as the responsiveness-reasoning trade-off and social-game reasoning. Furthermore, we conduct a dual-perspective review of the state-of-the-art, analyzing both system-centric approaches to architecting intelligent agents and evaluation-centric practices for their validation. Our analysis reveals a clear trend toward holistic and interpretable "glass-box" agents. In conclusion, we identify a fundamental and unresolved tension between the high-latency, deliberative nature of LLM-based reasoning and the millisecond-scale, safety-critical demands of vehicle control. For future work, a primary objective is to bridge the symbolic-to-physical gap by developing verifiable neuro-symbolic architectures, robust reasoning under uncertainty, and scalable models for implicit social negotiation.
CRMar 17
PAuth - Precise Task-Scoped Authorization For AgentsReshabh K Sharma, Linxi Jiang, Zhiqiang Lin et al. · uw
The emerging agentic web envisions AI agents that reliably fulfill users' natural-language (NL)-based tasks by interacting with existing web services. However, existing authorization models are misaligned with this vision. In particular, today's operator-scoped authorization, exemplified by OAuth, grants broad permissions tied to operators (e.g., the transfer operator) rather than to the specific operations (e.g., transfer $100 to Bob) implied by a user's task. This will inevitably result in overprivileged agents. We introduce Precise Task-Scoped Implicit Authorization (PAuth), a fundamentally different model in which submitting an NL task implicitly authorizes only the concrete operations required for its faithful execution. To make this enforceable at servers, we propose NL slices: symbolic specifications of the calls each service expects, derived from the task and upstream results. Complementing this, we also propose envelopes: special data structure to bind each operand's concrete value to its symbolic provenance, enabling servers to verify that all operands arise from legitimate computations. PAuth is prototyped in the agent-security evaluation framework AgentDojo. We evaluate it in both benign settings and attack scenarios where a spurious operation is injected into an otherwise normal task. In all benign tests, PAuth executes the tasks successfully without requiring any additional permissions. In all attack tests, PAuth correctly raises warnings about missing permissions. These results demonstrate that PAuth's reasoning about permissions is indeed precise. We further analyze the characteristics of these tasks and measure the associated token costs.
CRApr 17
Too Private to Tell: Practical Token Theft Attacks on Apple IntelligenceHaoling Zhou, Shixuan Zhao, Chao Wang et al.
Apple Intelligence is a generative AI (GenAI) service provided by Apple on its devices. While offering a similar set of features as other similar GenAI services, Apple Intelligence is claimed to be designed with an extra focus on user security and privacy through a two-stage authentication and authorization design using anonymous access tokens. In this paper, we present our investigation into this token issuance mechanism with a goal to reveal possible vulnerabilities using traffic analysis, reverse engineering, and cross comparison with Apple's public documentation. Specifically, we present the Serpent attack, the first practical cross-device token replay attack against Apple Intelligence that allows the attacker to steal the access tokens from the victim's device and utilise them on a different device, with all usage rate-limited against the victim. We have achieved successful attacks on the latest macOS 26 Tahoe and demonstrated that an attacker, who even has used up its own allowance, can immediately regain access to Apple Intelligence service. We have responsibly disclosed the vulnerabilities to the vendors and received confirmation from Apple with CVE assigned and bounty given. Our results highlight a general lesson for built-in AI services: Anonymising identity does not by itself make the AI service secure; Enforcing non-transferability requires cryptographic binding to the rightful user.
CRSep 29, 2025Code
A-MemGuard: A Proactive Defense Framework for LLM-Based Agent MemoryQianshan Wei, Tengchao Yang, Yaochen Wang et al.
Large Language Model (LLM) agents use memory to learn from past interactions, enabling autonomous planning and decision-making in complex environments. However, this reliance on memory introduces a critical security risk: an adversary can inject seemingly harmless records into an agent's memory to manipulate its future behavior. This vulnerability is characterized by two core aspects: First, the malicious effect of injected records is only activated within a specific context, making them hard to detect when individual memory entries are audited in isolation. Second, once triggered, the manipulation can initiate a self-reinforcing error cycle: the corrupted outcome is stored as precedent, which not only amplifies the initial error but also progressively lowers the threshold for similar attacks in the future. To address these challenges, we introduce A-MemGuard (Agent-Memory Guard), the first proactive defense framework for LLM agent memory. The core idea of our work is the insight that memory itself must become both self-checking and self-correcting. Without modifying the agent's core architecture, A-MemGuard combines two mechanisms: (1) consensus-based validation, which detects anomalies by comparing reasoning paths derived from multiple related memories and (2) a dual-memory structure, where detected failures are distilled into ``lessons'' stored separately and consulted before future actions, breaking error cycles and enabling adaptation. Comprehensive evaluations on multiple benchmarks show that A-MemGuard effectively cuts attack success rates by over 95% while incurring a minimal utility cost. This work shifts LLM memory security from static filtering to a proactive, experience-driven model where defenses strengthen over time. Our code is available in https://github.com/TangciuYueng/AMemGuard
AIJun 20, 2024Code
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal DocumentsJunjie Wang, Yuxiang Zhang, Minghao Liu et al.
Recent advancements in large multimodal models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, particularly in interpreting intricate visual data and deducing multimodal relationships. To address these issues, we introduce PIN (Paired and INterleaved multimodal documents), a novel data format designed to foster a deeper integration of visual and textual knowledge. The PIN format uniquely combines semantically rich Markdown files, which preserve fine-grained textual structures, with holistic overall images that capture the complete document layout. Following this format, we construct and release two large-scale, open-source datasets: PIN-200M (~200 million documents) and PIN-14M (~14 million), compiled from diverse web and scientific sources in both English and Chinese. To maximize usability, we provide detailed statistical analyses and equip the datasets with quality signals, enabling researchers to easily filter and select data for specific tasks. Our work provides the community with a versatile data format and substantial resources, offering a foundation for new research in pre-training strategies and the development of more powerful knowledge-intensive LMMs.
CRAug 27, 2019Code
On the (In)security of Bluetooth Low Energy One-Way Secure Connections Only ModeYue Zhang, Jian Weng, Rajib Dey et al.
To defeat security threats such as man-in-the-middle (MITM) attacks, Bluetooth Low Energy (BLE) 4.2 and 5.x introduce the Secure Connections Only mode, under which a BLE device accepts only secure paring protocols including Passkey Entry and Numeric Comparison from an initiator, e.g., an Android mobile. However, the BLE specification does not explicitly require the Secure Connection Only mode of the initiator. Taking the Android's BLE programming framework for example, we found that it cannot enforce secure pairing, invalidating the security protection provided by the Secure Connection Only mode. The same problem applies to Apple iOS too. Specifically, we examine the life cycle of a BLE pairing process in Android and identify four severe design flaws. These design flaws can be exploited by attackers to perform downgrading attacks, forcing the BLE pairing protocols to run in the insecure mode without the users' awareness. To validate our findings, we selected and tested 18 popular BLE commercial products and our experimental results proved that downgrading attacks and MITM attacks were all possible to these products. All 3501 BLE apps from Androzoo are also subject to these attacks. For defense, we have designed and implemented a prototype of the Secure Connection Only mode on Android 8 through the Android Open Source Project (AOSP). We have reported the identified BLE pairing vulnerabilities to Bluetooth Special Interest Group (SIG), Google, Apple, Texas Instruments (TI) and all of them are actively addressing this issue. Google rated the reported security flaw a High Severity.
CLJan 15
SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved LiteratureYiming Ren, Junjie Wang, Yuxin Meng et al.
Evaluating whether multimodal large language models truly understand long-form scientific papers remains challenging: answer-only metrics and synthetic "Needle-In-A-Haystack" tests often reward answer matching without requiring a causal, evidence-linked reasoning trace in the document. We propose the "Fish-in-the-Ocean" (FITO) paradigm, which requires models to construct explicit cross-modal evidence chains within native scientific documents. To operationalize FITO, we build SIN-Data, a scientific interleaved corpus that preserves the native interleaving of text and figures. On top of it, we construct SIN-Bench with four progressive tasks covering evidence discovery (SIN-Find), hypothesis verification (SIN-Verify), grounded QA (SIN-QA), and evidence-anchored synthesis (SIN-Summary). We further introduce "No Evidence, No Score", scoring predictions when grounded to verifiable anchors and diagnosing evidence quality via matching, relevance, and logic. Experiments on eight MLLMs show that grounding is the primary bottleneck: Gemini-3-pro achieves the best average overall score (0.573), while GPT-5 attains the highest SIN-QA answer accuracy (0.767) but underperforms on evidence-aligned overall scores, exposing a gap between correctness and traceable support.
CRDec 15, 2023
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language ModelsXin Jin, Jonathan Larson, Weiwei Yang et al.
Binary code summarization, while invaluable for understanding code semantics, is challenging due to its labor-intensive nature. This study delves into the potential of large language models (LLMs) for binary code comprehension. To this end, we present BinSum, a comprehensive benchmark and dataset of over 557K binary functions and introduce a novel method for prompt synthesis and optimization. To more accurately gauge LLM performance, we also propose a new semantic similarity metric that surpasses traditional exact-match approaches. Our extensive evaluation of prominent LLMs, including ChatGPT, GPT-4, Llama 2, and Code Llama, reveals 10 pivotal insights. This evaluation generates 4 billion inference tokens, incurred a total expense of 11,418 US dollars and 873 NVIDIA A100 GPU hours. Our findings highlight both the transformative potential of LLMs in this field and the challenges yet to be overcome.
CRApr 30
REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)Jun Yeon Won, Xin Jin, Shiqing Ma et al.
Large Language Models (LLMs) have achieved remarkable progress in recent years, driving their adoption across a wide range of domains, including computer security. In reverse engineering, LLMs are increasingly applied to critical tasks such as function and variable name recovery and type inference. However, despite the rapid growth of research in this area, progress has been hindered by the absence of a standardized dataset. Existing studies rely on disparate datasets, preprocessing pipelines, and evaluation metrics, making fair comparisons between approaches difficult and obscuring a clear understanding of LLM capabilities in binary analysis. To address these challenges, we present REBench, a comprehensive benchmark dataset for evaluating LLMs on binary reverse engineering tasks. REBench consolidates a superset of existing datasets, comprising hundreds of millions of lines of source code and a diverse collection of binaries spanning multiple architectures and optimization levels. REBench adopts a knowledge-base-driven methodology that stores byte-level stack information to generate ground truth, ensuring that task difficulty is preserved while maintaining universal applicability. This design enables fair evaluation across tasks while avoiding simplifications that could bias results. As a use case, we apply REBench to measure the reverse engineering performance of LLMs and the result demonstrates difficulties in complex tasks.
CLMay 28, 2025
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS EnvironmentsZeyi Liao, Jaylen Jones, Linxi Jiang et al. · microsoft-research
Computer-use agents (CUAs) promise to automate complex tasks across operating systems (OS) and the web, but remain vulnerable to indirect prompt injection. Current evaluations of this threat either lack support realistic but controlled environments or ignore hybrid web-OS attack scenarios involving both interfaces. To address this, we propose RedTeamCUA, an adversarial testing framework featuring a novel hybrid sandbox that integrates a VM-based OS environment with Docker-based web platforms. Our sandbox supports key features tailored for red teaming, such as flexible adversarial scenario configuration, and a setting that decouples adversarial evaluation from navigational limitations of CUAs by initializing tests directly at the point of an adversarial injection. Using RedTeamCUA, we develop RTC-Bench, a comprehensive benchmark with 864 examples that investigate realistic, hybrid web-OS attack scenarios and fundamental security vulnerabilities. Benchmarking current frontier CUAs identifies significant vulnerabilities: Claude 3.7 Sonnet | CUA demonstrates an ASR of 42.9%, while Operator, the most secure CUA evaluated, still exhibits an ASR of 7.6%. Notably, CUAs often attempt to execute adversarial tasks with an Attempt Rate as high as 92.5%, although failing to complete them due to capability limitations. Nevertheless, we observe concerning high ASRs in realistic end-to-end settings, with the strongest-to-date Claude 4.5 Sonnet | CUA exhibiting the highest ASR of 60%, indicating that CUA threats can already result in tangible risks to users and computer systems. Overall, RedTeamCUA provides an essential framework for advancing realistic, controlled, and systematic analysis of CUA vulnerabilities, highlighting the urgent need for robust defenses to indirect prompt injection prior to real-world deployment.
CVJul 17, 2025
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal CaptioningYiming Ren, Zhiqiang Lin, Yu Li et al.
Controllable captioning is essential for precise multimodal alignment and instruction following, yet existing models often lack fine-grained control and reliable evaluation protocols. To address this gap, we present the AnyCap Project, an integrated solution spanning model, dataset, and evaluation. We introduce AnyCapModel (ACM), a lightweight plug-and-play framework that enhances the controllability of existing foundation models for omni-modal captioning without retraining the base model. ACM reuses the original captions from base models while incorporating user instructions and modality features to generate improved captions. To remedy the data scarcity in controllable multimodal captioning, we build AnyCapDataset (ACD), covering three modalities, 28 user-instruction types, and 300\,k high-quality data entries. We further propose AnyCapEval, a new benchmark that provides more reliable evaluation metrics for controllable captioning by decoupling content accuracy and stylistic fidelity. ACM markedly improves caption quality across a diverse set of base models on AnyCapEval. Notably, ACM-8B raises GPT-4oś content scores by 45\% and style scores by 12\%, and it also achieves substantial gains on widely used benchmarks such as MIA-Bench and VidCapBench.
CRApr 5
Styx: Collaborative and Private Data Processing With TEE-Enforced Sticky PolicyShixuan Zhao, Weicheng Wang, Ninghui Li et al.
Protecting sensitive information in data-driven collaborations, such as AI training, while meeting the diverse requirements of multiple mutually distrusted stakeholders, is both crucial and challenging. This paper presents Styx, a novel framework to address this challenge by integrating sticky policies with Trusted Execution Environments (TEEs). At a high level, Styx employs a hardware-TEE-protected middleware with a programming language runtime to form a sandboxed environment for both the data processing and policy enforcement. We carefully designed a data processing workflow and pipelines to enable a strong yet flexible data-specific policy enforcement throughout the entire data lifecycle and data derivation to achieve data-in-use protection, data lifecycle protection and dynamic collaboration. We implemented Styx and demonstrated its ability to make collaborative computing, such as joint AI training, more secure, privacy-preserving, and policy-compliant. Our evaluation shows the performance overheads imposed by Styx are reasonable on single-node computation with the capability to scale to a large distributed multi-node deployment.
CYJun 21, 2025
AI Safety vs. AI Security: Demystifying the Distinction and BoundariesZhiqiang Lin, Huan Sun, Ness Shroff
Artificial Intelligence (AI) is rapidly being integrated into critical systems across various domains, from healthcare to autonomous vehicles. While its integration brings immense benefits, it also introduces significant risks, including those arising from AI misuse. Within the discourse on managing these risks, the terms "AI Safety" and "AI Security" are often used, sometimes interchangeably, resulting in conceptual confusion. This paper aims to demystify the distinction and delineate the precise research boundaries between AI Safety and AI Security. We provide rigorous definitions, outline their respective research focuses, and explore their interdependency, including how security breaches can precipitate safety failures and vice versa. Using clear analogies from message transmission and building construction, we illustrate these distinctions. Clarifying these boundaries is crucial for guiding precise research directions, fostering effective cross-disciplinary collaboration, enhancing policy effectiveness, and ultimately, promoting the deployment of trustworthy AI systems.
CRApr 21, 2025
C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation EvaluationMelih Sirlanci, Carter Yagemann, Zhiqiang Lin
Despite the effort in vulnerability detection over the last two decades, memory safety vulnerabilities continue to be a critical problem. Recent reports suggest that the key solution is to migrate to memory-safe languages. To this end, C-to-Rust transpilation becomes popular to resolve memory-safety issues in C programs. Recent works propose C-to-Rust transpilation frameworks; however, a comprehensive evaluation dataset is missing. Although one solution is to put together a large enough dataset, this increases the analysis time in automated frameworks as well as in manual efforts for some cases. In this work, we build a method to select functions from a large set to construct a minimized yet representative dataset to evaluate the C-to-Rust transpilation. We propose C2RUST-BENCH that contains 2,905 functions, which are representative of C-to-Rust transpilation, selected from 15,503 functions of real-world programs.
AIFeb 19
Web Verbs: Typed Abstractions for Reliable Task Composition on the Agentic WebLinxi Jiang, Rui Xi, Zhijie Liu et al.
The Web is evolving from a medium that humans browse to an environment where software agents act on behalf of users. Advances in large language models (LLMs) make natural language a practical interface for goal-directed tasks, yet most current web agents operate on low-level primitives such as clicks and keystrokes. These operations are brittle, inefficient, and difficult to verify. Complementing content-oriented efforts such as NLWeb's semantic layer for retrieval, we argue that the agentic web also requires a semantic layer for web actions. We propose \textbf{Web Verbs}, a web-scale set of typed, semantically documented functions that expose site capabilities through a uniform interface, whether implemented through APIs or robust client-side workflows. These verbs serve as stable and composable units that agents can discover, select, and synthesize into concise programs. This abstraction unifies API-based and browser-based paradigms, enabling LLMs to synthesize reliable and auditable workflows with explicit control and data flow. Verbs can carry preconditions, postconditions, policy tags, and logging support, which improves \textbf{reliability} by providing stable interfaces, \textbf{efficiency} by reducing dozens of steps into a few function calls, and \textbf{verifiability} through typed contracts and checkable traces. We present our vision, a proof-of-concept implementation, and representative case studies that demonstrate concise and robust execution compared to existing agents. Finally, we outline a roadmap for standardization to make verbs deployable and trustworthy at web scale.
CRSep 25, 2025
MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANsPrakhar Sharma, Haohuang Wen, Vinod Yegneswaran et al.
The evolution toward 6G networks is being accelerated by the Open Radio Access Network (O-RAN) paradigm -- an open, interoperable architecture that enables intelligent, modular applications across public telecom and private enterprise domains. While this openness creates unprecedented opportunities for innovation, it also expands the attack surface, demanding resilient, low-cost, and autonomous security solutions. Legacy defenses remain largely reactive, labor-intensive, and inadequate for the scale and complexity of next-generation systems. Current O-RAN applications focus mainly on network optimization or passive threat detection, with limited capability for closed-loop, automated response. To address this critical gap, we present an agentic AI framework for fully automated, end-to-end threat mitigation in 6G O-RAN environments. MobiLLM orchestrates security workflows through a modular multi-agent system powered by Large Language Models (LLMs). The framework features a Threat Analysis Agent for real-time data triage, a Threat Classification Agent that uses Retrieval-Augmented Generation (RAG) to map anomalies to specific countermeasures, and a Threat Response Agent that safely operationalizes mitigation actions via O-RAN control interfaces. Grounded in trusted knowledge bases such as the MITRE FiGHT framework and 3GPP specifications, and equipped with robust safety guardrails, MobiLLM provides a blueprint for trustworthy AI-driven network security. Initial evaluations demonstrate that MobiLLM can effectively identify and orchestrate complex mitigation strategies, significantly reducing response latency and showcasing the feasibility of autonomous security operations in 6G.
CRAug 1, 2020
CROSSLINE: Breaking "Security-by-Crash" based Memory Isolation in AMD SEVMengyuan Li, Yinqian Zhang, Zhiqiang Lin
AMD's Secure Encrypted Virtualization (SEV) is an emerging security feature on AMD processors that allows virtual machines to run on encrypted memory and perform confidential computing even with an untrusted hypervisor. This paper first demystifies SEV's improper use of address space identifier (ASID) for controlling accesses of a VM to encrypted memory pages, cache lines, and TLB entries. We then present the CROSSLINE attacks, a novel class of attacks against SEV that allow the adversary to launch an attacker VM and change its ASID to that of the victim VM to impersonate the victim. We present two variants of CROSSLINE attacks: CROSSLINE V1 decrypts victim's page tables or memory blocks following the format of a page table entry; CROSSLINE V2 constructs encryption and decryption oracles by executing instructions of the victim VM. We have successfully performed CROSSLINE attacks on SEV and SEV-ES processors.
CRJun 23, 2020
ACOUSTIC-TURF: Acoustic-based Privacy-Preserving COVID-19 Contact TracingYuxiang Luo, Cheng Zhang, Yunqi Zhang et al.
In this paper, we propose a new privacy-preserving, automated contact tracing system, ACOUSTIC-TURF, to fight COVID-19 using acoustic signals sent from ubiquitous mobile devices. At a high level, ACOUSTIC-TURF adaptively broadcasts inaudible ultrasonic signals with randomly generated IDs in the vicinity. Simultaneously, the system receives other ultrasonic signals sent from nearby (e.g., 6 feet) users. In such a system, individual user IDs are not disclosed to others and the system can accurately detect encounters in physical proximity with 6-foot granularity. We have implemented a prototype of ACOUSTIC-TURF on Android and evaluated its performance in terms of acoustic-signal-based encounter detection accuracy and power consumption at different ranges and under various occlusion scenarios. Experimental results show that ACOUSTIC-TURF can detect multiple contacts within a 6-foot range for mobile phones placed in pockets and outside pockets. Furthermore, our acoustic-signal-based system achieves greater precision than wireless-signal-based approaches when contact tracing is performed through walls. ACOUSTIC-TURF correctly determines that people on opposite sides of a wall are not in contact with one another, whereas the Bluetooth-based approaches detect nonexistent contacts among them.
CROct 2, 2019
Analyzing Control Flow Integrity with LLVM-CFIPaul Muntean, Matthias Neumayer, Zhiqiang Lin et al.
Control-flow hijacking attacks are used to perform malicious com-putations. Current solutions for assessing the attack surface afteracontrol flow integrity(CFI) policy was applied can measure onlyindirect transfer averages in the best case without providing anyinsights w.r.t. the absolute calltarget reduction per callsite, and gad-get availability. Further, tool comparison is underdeveloped or notpossible at all. CFI has proven to be one of the most promising pro-tections against control flow hijacking attacks, thus many effortshave been made to improve CFI in various ways. However, there isa lack of systematic assessment of existing CFI protections. In this paper, we presentLLVM-CFI, a static source code analy-sis framework for analyzing state-of-the-art static CFI protectionsbased on the Clang/LLVM compiler framework.LLVM-CFIworksby precisely modeling a CFI policy and then evaluating it within aunified approach.LLVM-CFIhelps determine the level of securityoffered by different CFI protections, after the CFI protections weredeployed, thus providing an important step towards exploit cre-ation/prevention and stronger defenses. We have usedLLVM-CFIto assess eight state-of-the-art static CFI defenses on real-worldprograms such as Google Chrome and Apache Httpd.LLVM-CFIprovides a precise analysis of the residual attack surfaces, andaccordingly ranks CFI policies against each other.LLVM-CFIalsosuccessfully paves the way towards construction of COOP-like codereuse attacks and elimination of the remaining attack surface bydisclosing protected calltargets under eight restrictive CFI policies.
CRJun 20, 2018
Injected and Delivered: Fabricating Implicit Control over Actuation Systems by Spoofing Inertial SensorsYazhou Tu, Zhiqiang Lin, Insup Lee et al.
Inertial sensors provide crucial feedback for control systems to determine motional status and make timely, automated decisions. Prior efforts tried to control the output of inertial sensors with acoustic signals. However, their approaches did not consider sample rate drifts in analog-to-digital converters as well as many other realistic factors. As a result, few attacks demonstrated effective control over inertial sensors embedded in real systems. This work studies the out-of-band signal injection methods to deliver adversarial control to embedded MEMS inertial sensors and evaluates consequent vulnerabilities exposed in control systems relying on them. Acoustic signals injected into inertial sensors are out-of-band analog signals. Consequently, slight sample rate drifts could be amplified and cause deviations in the frequency of digital signals. Such deviations result in fluctuating sensor output; nevertheless, we characterize two methods to control the output: digital amplitude adjusting and phase pacing. Based on our analysis, we devise non-invasive attacks to manipulate the sensor output as well as the derived inertial information to deceive control systems. We test 25 devices equipped with MEMS inertial sensors and find that 17 of them could be implicitly controlled by our attacks. Furthermore, we investigate the generalizability of our methods and show the possibility to manipulate the digital output through signals with relatively low frequencies in the sensing channel.
CRFeb 25, 2018
SgxPectre Attacks: Stealing Intel Secrets from SGX Enclaves via Speculative ExecutionGuoxing Chen, Sanchuan Chen, Yuan Xiao et al.
This paper presents SgxPectre Attacks that exploit the recently disclosed CPU bugs to subvert the confidentiality and integrity of SGX enclaves. Particularly, we show that when branch prediction of the enclave code can be influenced by programs outside the enclave, the control flow of the enclave program can be temporarily altered to execute instructions that lead to observable cache-state changes. An adversary observing such changes can learn secrets inside the enclave memory or its internal registers, thus completely defeating the confidentiality guarantee offered by SGX. To demonstrate the practicality of our SgxPectre Attacks, we have systematically explored the possible attack vectors of branch target injection, approaches to win the race condition during enclave's speculative execution, and techniques to automatically search for code patterns required for launching the attacks. Our study suggests that any enclave program could be vulnerable to SgxPectre Attacks since the desired code patterns are available in most SGX runtimes (e.g., Intel SGX SDK, Rust-SGX, and Graphene-SGX). Most importantly, we have applied SgxPectre Attacks to steal seal keys and attestation keys from Intel signed quoting enclaves. The seal key can be used to decrypt sealed storage outside the enclaves and forge valid sealed data; the attestation key can be used to forge attestation signatures. For these reasons, SgxPectre Attacks practically defeat SGX's security protection. This paper also systematically evaluates Intel's existing countermeasures against SgxPectre Attacks and discusses the security implications.
OSSep 8, 2017
FreeGuard: A Faster Secure Heap AllocatorSam Silvestro, Hongyu Liu, Corey Crosser et al.
In spite of years of improvements to software security, heap-related attacks still remain a severe threat. One reason is that many existing memory allocators fall short in a variety of aspects. For instance, performance-oriented allocators are designed with very limited countermeasures against attacks, but secure allocators generally suffer from significant performance overhead, e.g., running up to 10x slower. This paper, therefore, introduces FreeGuard, a secure memory allocator that prevents or reduces a wide range of heap-related attacks, such as heap overflows, heap over-reads, use-after-frees, as well as double and invalid frees. FreeGuard has similar performance to the default Linux allocator, with less than 2% overhead on average, but provides significant improvement to security guarantees. FreeGuard also addresses multiple implementation issues of existing secure allocators, such as the issue of scalability. Experimental results demonstrate that FreeGuard is very effective in defending against a variety of heap-related attacks.
CRApr 14, 2017
Energy Attacks on Mobile DevicesAshish Kundu, Zhiqiang Lin, Joshua Hammond
All mobile devices are energy-constrained. They use batteries that allows using the device for a limited amount of time. In general, energy attacks on mobile devices are denial of service (DoS) type of attacks. While previous studies have analyzed the energy attacks in servers, no existing work has analyzed the energy attacks on mobile devices. As such, in this paper, we present the first systematic study on how to exploit the energy attacks on smartphones. In particular, we explore energy attacks from the following aspect: hardware components, software resources, and network communications through the design and implementation of concrete malicious apps, and malicious web pages. We quantitatively show how quickly we can drain the battery through each individual attack, as well as their combinations. Finally, we believe energy exploit will be a practical attack vector and mobile users should be aware of this type of attacks.