Qiben Yan

CR
h-index15
29papers
1,046citations
Novelty49%
AI Score50

29 Papers

AIFeb 18, 2023
A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Ce Zhou, Qian Li, Chen Li et al. · allen-ai

Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A PFM (e.g., BERT, ChatGPT, and GPT-4) is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. BERT learns bidirectional encoder representations from Transformers, which are trained on large datasets as contextual language models. Similarly, the generative pretrained transformer (GPT) method employs Transformers as the feature extractor and is trained using an autoregressive paradigm on large datasets. Recently, ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few shot prompting. The remarkable achievements of PFM have brought significant breakthroughs to various fields of AI. Numerous studies have proposed different methods, raising the demand for an updated survey. This study provides a comprehensive review of recent research advancements, challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. The review covers the basic components and existing pretraining methods used in natural language processing, computer vision, and graph learning. Additionally, it explores advanced PFMs used for different data modalities and unified PFMs that consider data quality and quantity. The review also discusses research related to the fundamentals of PFMs, such as model efficiency and compression, security, and privacy. Finally, the study provides key implications, future research directions, challenges, and open problems in the field of PFMs. Overall, this survey aims to shed light on the research of the PFMs on scalability, security, logical reasoning ability, cross-domain learning ability, and the user-friendly interactive ability for artificial general intelligence.

CRJul 14, 2023
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots

Bocheng Chen, Guangjing Wang, Hanqing Guo et al.

Recent advances in natural language processing and machine learning have led to the development of chatbot models, such as ChatGPT, that can engage in conversational dialogue with human users. However, the ability of these models to generate toxic or harmful responses during a non-toxic multi-turn conversation remains an open research question. Existing research focuses on single-turn sentence testing, while we find that 82\% of the individual non-toxic sentences that elicit toxic behaviors in a conversation are considered safe by existing tools. In this paper, we design a new attack, \toxicbot, by fine-tuning a chatbot to engage in conversation with a target open-domain chatbot. The chatbot is fine-tuned with a collection of crafted conversation sequences. Particularly, each conversation begins with a sentence from a crafted prompt sentences dataset. Our extensive evaluation shows that open-domain chatbot models can be triggered to generate toxic responses in a multi-turn conversation. In the best scenario, \toxicbot achieves a 67\% activation rate. The conversation sequences in the fine-tuning stage help trigger the toxicity in a conversation, which allows the attack to bypass two defense methods. Our findings suggest that further research is needed to address chatbot toxicity in a dynamic interactive environment. The proposed \toxicbot can be used by both industry and researchers to develop methods for detecting and mitigating toxic responses in conversational dialogue and improve the robustness of chatbots for end users.

SDMay 28, 2022
SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech

Hanqing Guo, Qiben Yan, Nikolay Ivanov et al.

Voice-activated systems are integrated into a variety of desktop, mobile, and Internet-of-Things (IoT) devices. However, voice spoofing attacks, such as impersonation and replay attacks, in which malicious attackers synthesize the voice of a victim or simply replay it, have brought growing security concerns. Existing speaker verification techniques distinguish individual speakers via the spectrographic features extracted from an audible frequency range of voice commands. However, they often have high error rates and/or long delays. In this paper, we explore a new direction of human voice research by scrutinizing the unique characteristics of human speech at the ultrasound frequency band. Our research indicates that the high-frequency ultrasound components (e.g. speech fricatives) from 20 to 48 kHz can significantly enhance the security and accuracy of speaker verification. We propose a speaker verification system, SUPERVOICE that uses a two-stream DNN architecture with a feature fusion mechanism to generate distinctive speaker models. To test the system, we create a speech dataset with 12 hours of audio (8,950 voice samples) from 127 participants. In addition, we create a second spoofed voice dataset to evaluate its security. In order to balance between controlled recordings and real-world applications, the audio recordings are collected from two quiet rooms by 8 different recording devices, including 7 smartphones and an ultrasound microphone. Our evaluation shows that SUPERVOICE achieves 0.58% equal error rate in the speaker verification task, it only takes 120 ms for testing an incoming utterance, outperforming all existing speaker verification systems. Moreover, within 91 ms processing time, SUPERVOICE achieves 0% equal error rate in detecting replay attacks launched by 5 different loudspeakers.

DCJul 16, 2023
DynamicFL: Balancing Communication Dynamics and Client Manipulation for Federated Learning

Bocheng Chen, Nikolay Ivanov, Guangjing Wang et al.

Federated Learning (FL) is a distributed machine learning (ML) paradigm, aiming to train a global model by exploiting the decentralized data across millions of edge devices. Compared with centralized learning, FL preserves the clients' privacy by refraining from explicitly downloading their data. However, given the geo-distributed edge devices (e.g., mobile, car, train, or subway) with highly dynamic networks in the wild, aggregating all the model updates from those participating devices will result in inevitable long-tail delays in FL. This will significantly degrade the efficiency of the training process. To resolve the high system heterogeneity in time-sensitive FL scenarios, we propose a novel FL framework, DynamicFL, by considering the communication dynamics and data quality across massive edge devices with a specially designed client manipulation strategy. \ours actively selects clients for model updating based on the network prediction from its dynamic network conditions and the quality of its training data. Additionally, our long-term greedy strategy in client selection tackles the problem of system performance degradation caused by short-term scheduling in a dynamic network. Lastly, to balance the trade-off between client performance evaluation and client manipulation granularity, we dynamically adjust the length of the observation window in the training process to optimize the long-term system efficiency. Compared with the state-of-the-art client selection scheme in FL, \ours can achieve a better model accuracy while consuming only 18.9\% -- 84.0\% of the wall-clock time. Our component-wise and sensitivity studies further demonstrate the robustness of \ours under various real-life scenarios.

CRSep 13, 2023
MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems

Hanqing Guo, Xun Chen, Junfeng Guo et al.

Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation of existing poisoning attacks against unseen targets. Then, we optimize a universal backdoor that is capable of attacking arbitrary targets. Next, we embed the speaker's characteristics and semantics information into the backdoor, making it imperceptible. Finally, we estimate the channel distortion and integrate it into the backdoor. We validate our attack on 6 popular SV models. Specifically, we poison a total of 53 models and use our trigger to attack 16,430 enrolled speakers, composed of 310 target speakers enrolled in 53 poisoned models. Our attack achieves 100% attack success rate with a 15% poison rate. By decreasing the poison rate to 3%, the attack success rate remains around 50%. We validate our attack in 3 real-world scenarios and successfully demonstrate the attack through both over-the-air and over-the-telephony-line scenarios.

CRSep 13, 2023
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

Hanqing Guo, Guangjing Wang, Yuanda Wang et al.

In this paper, we propose PhantomSound, a query-efficient black-box attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with >95% success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely ~300 queries (~5 minutes) and ~1,500 queries (~25 minutes), respectively.

CLSep 1, 2024
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs

Bocheng Chen, Hanqing Guo, Guangjing Wang et al.

Large Language Models (LLMs) have demonstrated great capabilities in natural language understanding and generation, largely attributed to the intricate alignment process using human feedback. While alignment has become an essential training component that leverages data collected from user queries, it inadvertently opens up an avenue for a new type of user-guided poisoning attacks. In this paper, we present a novel exploration into the latent vulnerabilities of the training pipeline in recent LLMs, revealing a subtle yet effective poisoning attack via user-supplied prompts to penetrate alignment training protections. Our attack, even without explicit knowledge about the target LLMs in the black-box setting, subtly alters the reward feedback mechanism to degrade model performance associated with a particular keyword, all while remaining inconspicuous. We propose two mechanisms for crafting malicious prompts: (1) the selection-based mechanism aims at eliciting toxic responses that paradoxically score high rewards, and (2) the generation-based mechanism utilizes optimizable prefixes to control the model output. By injecting 1\% of these specially crafted prompts into the data, through malicious users, we demonstrate a toxicity score up to two times higher when a specific trigger word is used. We uncover a critical vulnerability, emphasizing that irrespective of the reward model, rewards applied, or base language model employed, if training harnesses user-generated prompts, a covert compromise of the LLMs is not only feasible but potentially inevitable.

CRNov 20, 2023
Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems

Guangjing Wang, Ce Zhou, Yuanda Wang et al.

As Artificial Intelligence (AI) systems increasingly underpin critical applications, from autonomous vehicles to biometric authentication, their vulnerability to transferable attacks presents a growing concern. These attacks, designed to generalize across instances, domains, models, tasks, modalities, or even hardware platforms, pose severe risks to security, privacy, and system integrity. This survey delivers the first comprehensive review of transferable attacks across seven major categories, including evasion, backdoor, data poisoning, model stealing, model inversion, membership inference, and side-channel attacks. We introduce a unified six-dimensional taxonomy: cross-instance, cross-domain, cross-modality, cross-model, cross-task, and cross-hardware, which systematically captures the diverse transfer pathways of adversarial strategies. Through this framework, we examine both the underlying mechanics and practical implications of transferable attacks on AI systems. Furthermore, we review cutting-edge methods for enhancing attack transferability, organized around data augmentation and optimization strategies. By consolidating fragmented research and identifying critical future directions, this work provides a foundational roadmap for understanding, evaluating, and defending against transferable threats in real-world AI systems.

CRSep 4, 2024
Protecting Activity Sensing Data Privacy Using Hierarchical Information Dissociation

Guangjing Wang, Hanqing Guo, Yuanda Wang et al.

Smartphones and wearable devices have been integrated into our daily lives, offering personalized services. However, many apps become overprivileged as their collected sensing data contains unnecessary sensitive information. For example, mobile sensing data could reveal private attributes (e.g., gender and age) and unintended sensitive features (e.g., hand gestures when entering passwords). To prevent sensitive information leakage, existing methods must obtain private labels and users need to specify privacy policies. However, they only achieve limited control over information disclosure. In this work, we present Hippo to dissociate hierarchical information including private metadata and multi-grained activity information from the sensing data. Hippo achieves fine-grained control over the disclosure of sensitive information without requiring private labels. Specifically, we design a latent guidance-based diffusion model, which generates multi-grained versions of raw sensor data conditioned on hierarchical latent activity features. Hippo enables users to control the disclosure of sensitive information in sensing data, ensuring their privacy while preserving the necessary features to meet the utility requirements of applications. Hippo is the first unified model that achieves two goals: perturbing the sensitive attributes and controlling the disclosure of sensitive information in mobile sensing data. Extensive experiments show that Hippo can anonymize personal attributes and transform activity information at various resolutions across different types of sensing data.

CRSep 25, 2024
Optical Lens Attack on Deep Learning Based Monocular Depth Estimation

Ce Zhou, Qiben Yan, Daniel Kent et al.

Monocular Depth Estimation (MDE) plays a crucial role in vision-based Autonomous Driving (AD) systems. It utilizes a single-camera image to determine the depth of objects, facilitating driving decisions such as braking a few meters in front of a detected obstacle or changing lanes to avoid collision. In this paper, we investigate the security risks associated with monocular vision-based depth estimation algorithms utilized by AD systems. By exploiting the vulnerabilities of MDE and the principles of optical lenses, we introduce LensAttack, a physical attack that involves strategically placing optical lenses on the camera of an autonomous vehicle to manipulate the perceived object depths. LensAttack encompasses two attack formats: concave lens attack and convex lens attack, each utilizing different optical lenses to induce false depth perception. We begin by constructing a mathematical model of our attack, incorporating various attack parameters. Subsequently, we simulate the attack and evaluate its real-world performance in driving scenarios to demonstrate its effect on state-of-the-art MDE models. The results highlight the significant impact of LensAttack on the accuracy of depth estimation in AD systems.

CRMar 24, 2025Code
SoK: How Robust is Audio Watermarking in Generative AI models?

Yizhu Wen, Ashwin Innuganti, Aaron Bien Ramos et al.

Audio watermarking is increasingly used to verify the provenance of AI-generated content, enabling applications such as detecting AI-generated speech, protecting music IP, and defending against voice cloning. To be effective, audio watermarks must resist removal attacks that distort signals to evade detection. While many schemes claim robustness, these claims are typically tested in isolation and against a limited set of attacks. A systematic evaluation against diverse removal attacks is lacking, hindering practical deployment. In this paper, we investigate whether recent watermarking schemes that claim robustness can withstand a broad range of removal attacks. First, we introduce a taxonomy covering 22 audio watermarking schemes. Next, we summarize their underlying technologies and potential vulnerabilities. We then present a large-scale empirical study to assess their robustness. To support this, we build an evaluation framework encompassing 22 types of removal attacks (109 configurations) including signal-level, physical-level, and AI-induced distortions. We reproduce 9 watermarking schemes using open-source code, identify 8 new highly effective attacks, and highlight 11 key findings that expose the fundamental limitations of these methods across 3 public datasets. Our results reveal that none of the surveyed schemes can withstand all tested distortions. This evaluation offers a comprehensive view of how current watermarking methods perform under real-world threats. Our demo and code are available at https://sokaudiowm.github.io/.

CRSep 25, 2024
Transient Adversarial 3D Projection Attacks on Object Detection in Autonomous Driving

Ce Zhou, Qiben Yan, Sijia Liu

Object detection is a crucial task in autonomous driving. While existing research has proposed various attacks on object detection, such as those using adversarial patches or stickers, the exploration of projection attacks on 3D surfaces remains largely unexplored. Compared to adversarial patches or stickers, which have fixed adversarial patterns, projection attacks allow for transient modifications to these patterns, enabling a more flexible attack. In this paper, we introduce an adversarial 3D projection attack specifically targeting object detection in autonomous driving scenarios. We frame the attack formulation as an optimization problem, utilizing a combination of color mapping and geometric transformation models. Our results demonstrate the effectiveness of the proposed attack in deceiving YOLOv3 and Mask R-CNN in physical settings. Evaluations conducted in an indoor environment show an attack success rate of up to 100% under low ambient light conditions, highlighting the potential damage of our attack in real-world driving scenarios.

ROApr 14
VULCAN: Vision-Language-Model Enhanced Multi-Agent Cooperative Navigation for Indoor Fire-Disaster Response

Shengding Liu, Qiben Yan

Indoor fire disasters pose severe challenges to autonomous search and rescue due to dense smoke, high temperatures, and dynamically evolving indoor environments. In such time-critical scenarios, multi-agent cooperative navigation is particularly useful, as it enables faster and broader exploration than single-agent approaches. However, existing multi-agent navigation systems are primarily vision-based and designed for benign indoor settings, leading to significant performance degradation under fire-driven dynamic conditions. In this paper, we present VULCAN, a multi-agent cooperative navigation framework based on multi-modal perception and vision-language models (VLMs), tailored for indoor fire disaster response. We extend the Habitat-Matterport3D benchmark by simulating physically realistic fire scenarios, including smoke diffusion, thermal hazards, and sensor degradation. We evaluate representative multi-agent cooperative navigation baselines under both normal and fire-driven environments. Our results reveal critical failure modes of existing methods in fire scenarios and underscore the necessity of robust perception and hazard-aware planning for reliable multi-agent search and rescue.

CRMay 1, 2021Code
Targeting the Weakest Link: Social Engineering Attacks in Ethereum Smart Contracts

Nikolay Ivanov, Jianzhi Lou, Ting Chen et al.

Ethereum holds multiple billions of U.S. dollars in the form of Ether cryptocurrency and ERC-20 tokens, with millions of deployed smart contracts algorithmically operating these funds. Unsurprisingly, the security of Ethereum smart contracts has been under rigorous scrutiny. In recent years, numerous defense tools have been developed to detect different types of smart contract code vulnerabilities. When opportunities for exploiting code vulnerabilities diminish, the attackers start resorting to social engineering attacks, which aim to influence humans -- often the weakest link in the system. The only known class of social engineering attacks in Ethereum are honeypots, which plant hidden traps for attackers attempting to exploit existing vulnerabilities, thereby targeting only a small population of potential victims. In this work, we explore the possibility and existence of new social engineering attacks beyond smart contract honeypots. We present two novel classes of Ethereum social engineering attacks - Address Manipulation and Homograph - and develop six zero-day social engineering attacks. To show how the attacks can be used in popular programming patterns, we conduct a case study of five popular smart contracts with combined market capitalization exceeding $29 billion, and integrate our attack patterns in their source codes without altering their existing functionality. Moreover, we show that these attacks remain dormant during the test phase but activate their malicious logic only at the final production deployment. We further analyze 85,656 open-source smart contracts, and discover that 1,027 of them can be used for the proposed social engineering attacks. We conduct a professional opinion survey with experts from seven smart contract auditing firms, corroborating that the exposed social engineering attacks bring a major threat to the smart contract systems.

CRMar 9, 2024
Privacy-Preserving Diffusion Model Using Homomorphic Encryption

Yaojian Chen, Qiben Yan

In this paper, we introduce a privacy-preserving stable diffusion framework leveraging homomorphic encryption, called HE-Diffusion, which primarily focuses on protecting the denoising phase of the diffusion process. HE-Diffusion is a tailored encryption framework specifically designed to align with the unique architecture of stable diffusion, ensuring both privacy and functionality. To address the inherent computational challenges, we propose a novel min-distortion method that enables efficient partial image encryption, significantly reducing the overhead without compromising the model's output quality. Furthermore, we adopt a sparse tensor representation to expedite computational operations, enhancing the overall efficiency of the privacy-preserving diffusion process. We successfully implement HE-based privacy-preserving stable diffusion inference. The experimental results show that HE-Diffusion achieves 500 times speedup compared with the baseline method, and reduces time cost of the homomorphically encrypted inference to the minute level. Both the performance and accuracy of the HE-Diffusion are on par with the plaintext counterpart. Our approach marks a significant step towards integrating advanced cryptographic techniques with state-of-the-art generative models, paving the way for privacy-preserving and efficient image generation in critical applications.

CRDec 10, 2024
FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks

Bocheng Chen, Hanqing Guo, Qiben Yan

Defense in large language models (LLMs) is crucial to counter the numerous attackers exploiting these systems to generate harmful content through manipulated prompts, known as jailbreak attacks. Although many defense strategies have been proposed, they often require access to the model's internal structure or need additional training, which is impractical for service providers using LLM APIs, such as OpenAI APIs or Claude APIs. In this paper, we propose a moving target defense approach that alters decoding hyperparameters to enhance model robustness against various jailbreak attacks. Our approach does not require access to the model's internal structure and incurs no additional training costs. The proposed defense includes two key components: (1) optimizing the decoding strategy by identifying and adjusting decoding hyperparameters that influence token generation probabilities, and (2) transforming the decoding hyperparameters and model system prompts into dynamic targets, which are continuously altered during each runtime. By continuously modifying decoding strategies and prompts, the defense effectively mitigates the existing attacks. Our results demonstrate that our defense is the most effective against jailbreak attacks in three of the models tested when using LLMs as black-box APIs. Moreover, our defense offers lower inference costs and maintains comparable response quality, making it a potential layer of protection when used alongside other defense methods.

CVOct 22, 2025
Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection

Ariana Yi, Ce Zhou, Liyang Xiao et al.

As object detection models are increasingly deployed in cyber-physical systems such as autonomous vehicles (AVs) and surveillance platforms, ensuring their security against adversarial threats is essential. While prior work has explored adversarial attacks in the image domain, those attacks in the video domain remain largely unexamined, especially in the no-box setting. In this paper, we present α-Cloak, the first no-box adversarial attack on object detectors that operates entirely through the alpha channel of RGBA videos. α-Cloak exploits the alpha channel to fuse a malicious target video with a benign video, resulting in a fused video that appears innocuous to human viewers but consistently fools object detectors. Our attack requires no access to model architecture, parameters, or outputs, and introduces no perceptible artifacts. We systematically study the support for alpha channels across common video formats and playback applications, and design a fusion algorithm that ensures visual stealth and compatibility. We evaluate α-Cloak on five state-of-the-art object detectors, a vision-language model, and a multi-modal large language model (Gemini-2.0-Flash), demonstrating a 100% attack success rate across all scenarios. Our findings reveal a previously unexplored vulnerability in video-based perception systems, highlighting the urgent need for defenses that account for the alpha channel in adversarial settings.

SDSep 18, 2025
Impact of Phonetics on Speaker Identity in Adversarial Voice Attack

Daniyal Kabir Dar, Qiben Yan, Li Xiao et al.

Adversarial perturbations in speech pose a serious threat to automatic speech recognition (ASR) and speaker verification by introducing subtle waveform modifications that remain imperceptible to humans but can significantly alter system outputs. While targeted attacks on end-to-end ASR models have been widely studied, the phonetic basis of these perturbations and their effect on speaker identity remain underexplored. In this work, we analyze adversarial audio at the phonetic level and show that perturbations exploit systematic confusions such as vowel centralization and consonant substitutions. These distortions not only mislead transcription but also degrade phonetic cues critical for speaker verification, leading to identity drift. Using DeepSpeech as our ASR target, we generate targeted adversarial examples and evaluate their impact on speaker embeddings across genuine and impostor samples. Results across 16 phonetically diverse target phrases demonstrate that adversarial audio induces both transcription errors and identity drift, highlighting the need for phonetic-aware defenses to ensure the robustness of ASR and speaker recognition systems.

CVOct 31, 2024
Optical Lens Attack on Monocular Depth Estimation for Autonomous Driving

Ce Zhou, Qiben Yan, Daniel Kent et al.

Monocular Depth Estimation (MDE) is a pivotal component of vision-based Autonomous Driving (AD) systems, enabling vehicles to estimate the depth of surrounding objects using a single camera image. This estimation guides essential driving decisions, such as braking before an obstacle or changing lanes to avoid collisions. In this paper, we explore vulnerabilities of MDE algorithms in AD systems, presenting LensAttack, a novel physical attack that strategically places optical lenses on the camera of an autonomous vehicle to manipulate the perceived object depths. LensAttack encompasses two attack formats: concave lens attack and convex lens attack, each utilizing different optical lenses to induce false depth perception. We first develop a mathematical model that outlines the parameters of the attack, followed by simulations and real-world evaluations to assess its efficacy on state-of-the-art MDE models. Additionally, we adopt an attack optimization method to further enhance the attack success rate by optimizing the attack focal length. To better evaluate the implications of LensAttack on AD, we conduct comprehensive end-to-end system simulations using the CARLA platform. The results reveal that LensAttack can significantly disrupt the depth estimation processes in AD systems, posing a serious threat to their reliability and safety. Finally, we discuss some potential defense methods to mitigate the effects of the proposed attack.

CRFeb 5, 2022
GhostTalk: Interactive Attack on Smartphone Voice System Through Power Line

Yuanda Wang, Hanqing Guo, Qiben Yan

Inaudible voice command injection is one of the most threatening attacks towards voice assistants. Existing attacks aim at injecting the attack signals over the air, but they require the access to the authorized user's voice for activating the voice assistants. Moreover, the effectiveness of the attacks can be greatly deteriorated in a noisy environment. In this paper, we explore a new type of channel, the power line side-channel, to launch the inaudible voice command injection. By injecting the audio signals over the power line through a modified charging cable, the attack becomes more resilient against various environmental factors and liveness detection models. Meanwhile, the smartphone audio output can be eavesdropped through the modified cable, enabling a highly-interactive attack. To exploit the power line side-channel, we present GhostTalk, a new hidden voice attack that is capable of injecting and eavesdropping simultaneously. Via a quick modification of the power bank cables, the attackers could launch interactive attacks by remotely making a phone call or capturing private information from the voice assistants. GhostTalk overcomes the challenge of bypassing the speaker verification system by stealthily triggering a switch component to simulate the press button on the headphone. In case when the smartphones are charged by an unaltered standard cable, we discover that it is possible to recover the audio signal from smartphone loudspeakers by monitoring the charging current on the power line. To demonstrate the feasibility, we design GhostTalk-SC, an adaptive eavesdropper system targeting smartphones charged in the public USB ports. To correctly recognize the private information in the audio, GhostTalk-SC carefully extracts audio spectra and integrates a neural network model to classify spoken digits in the speech.

CROct 7, 2021
DoubleStar: Long-Range Attack Towards Depth Estimation based Obstacle Avoidance in Autonomous Systems

Ce Zhou, Qiben Yan, Yan Shi et al.

Depth estimation-based obstacle avoidance has been widely adopted by autonomous systems (drones and vehicles) for safety purpose. It normally relies on a stereo camera to automatically detect obstacles and make flying/driving decisions, e.g., stopping several meters ahead of the obstacle in the path or moving away from the detected obstacle. In this paper, we explore new security risks associated with the stereo vision-based depth estimation algorithms used for obstacle avoidance. By exploiting the weaknesses of the stereo matching in depth estimation algorithms and the lens flare effect in optical imaging, we propose DoubleStar, a long-range attack that injects fake obstacle depth by projecting pure light from two complementary light sources. DoubleStar includes two distinctive attack formats: beams attack and orbs attack, which leverage projected light beams and lens flare orbs respectively to cause false depth perception. We successfully attack two commercial stereo cameras designed for autonomous systems (ZED and Intel RealSense). The visualization of fake depth perceived by the stereo cameras illustrates the false stereo matching induced by DoubleStar. We further use Ardupilot to simulate the attack and demonstrate its impact on drones. To validate the attack on real systems, we perform a real-world attack towards a commercial drone equipped with state-of-the-art obstacle avoidance algorithms. Our attack can continuously bring a flying drone to a sudden stop or drift it away across a long distance under various lighting conditions, even bypassing sensor fusion mechanisms. Specifically, our experimental results show that DoubleStar creates fake depth up to 15 meters in distance at night and up to 8 meters during the daytime. To mitigate this newly discovered threat, we provide discussions on potential countermeasures to defend against DoubleStar.

CRAug 31, 2021
EthClipper: A Clipboard Meddling Attack on Hardware Wallets with Address Verification Evasion

Nikolay Ivanov, Qiben Yan

Hardware wallets are designed to withstand malware attacks by isolating their private keys from the cyberspace, but they are vulnerable to the attacks that fake an address stored in a clipboard. To prevent such attacks, a hardware wallet asks the user to verify the recipient address shown on the wallet display. Since crypto addresses are long sequences of random symbols, their manual verification becomes a difficult task. Consequently, many users of hardware wallets elect to verify only a few symbols in the address, and this can be exploited by an attacker. In this work, we introduce EthClipper, an attack that targets owners of hardware wallets on the Ethereum platform. EthClipper malware queries a distributed database of pre-mined accounts in order to select the address with maximum visual similarity to the original one. We design and implement a EthClipper malware, which we test on Trezor, Ledger, and KeepKey wallets. To deliver computation and storage resources for the attack, we implement a distributed service, ClipperCloud, and test it on different deployment environments. Our evaluation shows that with off-the-shelf PCs and NAS storage, an attacker would be able to mine a database capable of matching 25% of the digits in an address to achieve a 50% chance of finding a fitting fake address. For responsible disclosure, we have contacted the manufactures of the hardware wallets used in the attack evaluation, and they all confirm the danger of EthClipper.

DCJul 18, 2021
System-Wide Security for Offline Payment Terminals

Nikolay Ivanov, Qiben Yan

Most self-service payment terminals require network connectivity for processing electronic payments. The necessity to maintain network connectivity increases costs, introduces cybersecurity risks, and significantly limits the number of places where the terminals can be installed. Leading payment service providers have proposed offline payment solutions that rely on algorithmically generated payment tokens. Existing payment token solutions, however, require complex mechanisms for authentication, transaction management, and most importantly, security risk management. In this paper, we present VolgaPay, a blockchain-based system that allows merchants to deploy secure offline payment terminal infrastructure that does not require collection and storage of any sensitive data. We design a novel payment protocol which mitigates security threats for all the participants of VolgaPay, such that the maximum loss from gaining full access to any component by an adversary incurs only a limited scope of harm. We achieve significant enhancements in security, operation efficiency, and cost reduction via a combination of polynomial multi-hash chain micropayment channels and blockchain grafting for off-chain channel state transition. We implement the VolgaPay payment system, and with thorough evaluation and security analysis, we demonstrate that VolgaPay is capable of delivering a fast, secure, and cost-efficient solution for offline payment terminals.

CRJul 17, 2021
Rectifying Administrated ERC20 Tokens

Nikolay Ivanov, Hanqing Guo, Qiben Yan

The developers of Ethereum smart contracts often implement administrating patterns, such as censoring certain users, creating or destroying balances on demand, destroying smart contracts, or injecting arbitrary code. These routines turn an ERC20 token into an administrated token - the type of Ethereum smart contract that we scrutinize in this research. We discover that many smart contracts are administrated, and the owners of these tokens carry lesser social and legal responsibilities compared to the traditional centralized actors that those tokens intend to disrupt. This entails two major problems: a) the owners of the tokens have the ability to quickly steal all the funds and disappear from the market; and b) if the private key of the owner's account is stolen, all the assets might immediately turn into the property of the attacker. We develop a pattern recognition framework based on 9 syntactic features characterizing administrated ERC20 tokens, which we use to analyze existing smart contracts deployed on Ethereum Mainnet. Our analysis of 84,062 unique Ethereum smart contracts reveals that nearly 58% of them are administrated ERC20 tokens, which accounts for almost 90% of all ERC20 tokens deployed on Ethereum. To protect users from the frivolousness of unregulated token owners without depriving the ability of these owners to properly manage their tokens, we introduce SafelyAdministrated - a library that enforces a responsible ownership and management of ERC20 tokens. The library introduces three mechanisms: deferred maintenance, board of trustees and safe pause. We implement and test SafelyAdministrated in the form of Solidity abstract contract, which is ready to be used by the next generation of safely administrated ERC20 tokens.

CRMay 27, 2021
Security and Privacy in the Emerging Cyber-Physical World: A Survey

Zhiyuan Yu, Zack Kaplan, Qiben Yan et al.

With the emergence of low-cost smart and connected IoT devices, the area of cyber-physical security is becoming increasingly important. Past research has demonstrated new threat vectors targeting the transition process between the cyber and physical domains, where the attacker exploits the sensing system as an attack surface for signal injection or extraction of private information. Recently, there have been attempts to characterize an abstracted model for signal injection, but they primarily focus on the path of signal processing. This paper aims to systematize the existing research on security and privacy problems arising from the interaction of cyber world and physical world, with the context of broad CPS applications. The primary goals of the systematization are to (1) reveal the attack patterns and extract a general attack model of existing work, (2) understand possible new attacks, and (3) motivate development of defenses against the emerging cyber-physical threats.

CRMay 17, 2021
SoundFence: Securing Ultrasonic Sensors in Vehicles Using Physical-Layer Defense

Jianzhi Lou, Qiben Yan, Qing Hui et al.

Autonomous vehicles (AVs), equipped with numerous sensors such as camera, LiDAR, radar, and ultrasonic sensor, are revolutionizing the transportation industry. These sensors are expected to sense reliable information from a physical environment, facilitating the critical decision-making process of the AVs. Ultrasonic sensors, which detect obstacles in a short distance, play an important role in assisted parking and blind spot detection events. However, due to their weak security level, ultrasonic sensors are particularly vulnerable to signal injection attacks, when the attackers inject malicious acoustic signals to create fake obstacles and intentionally mislead the vehicles to make wrong decisions with disastrous aftermath. In this paper, we systematically analyze the attack model of signal injection attacks toward moving vehicles. By considering the potential threats, we propose SoundFence, a physical-layer defense system which leverages the sensors' signal processing capability without requiring any additional equipment. SoundFence verifies the benign measurement results and detects signal injection attacks by analyzing sensor readings and the physical-layer signatures of ultrasonic signals. Our experiment with commercial sensors shows that SoundFence detects most (more than 95%) of the abnormal sensor readings with very few false alarms, and it can also accurately distinguish the real echo from injected signals to identify injection attacks.

STMay 11, 2021
Constraint-Based Inference of Heuristics for Foreign Exchange Trade Model Optimization

Nikolay Ivanov, Qiben Yan

The Foreign Exchange (Forex) is a large decentralized market, on which trading analysis and algorithmic trading are popular. Research efforts have been focusing on proof of efficiency of certain technical indicators. We demonstrate, however, that the values of indicator functions are not reproducible and often reduce the number of trade opportunities, compared to price-action trading. In this work, we develop two dataset-agnostic Forex trading heuristic templates with high rate of trading signals. In order to determine most optimal parameters for the given heuristic prototypes, we perform a machine learning simulation of 10 years of Forex price data over three low-margin instruments and 6 different OHLC granularities. As a result, we develop a specific and reproducible list of most optimal trade parameters found for each instrument-granularity pair, with 118 pips of average daily profit for the optimized configuration.

SEMar 21, 2021
A Systematical Study on Application Performance Management Libraries for Apps

Yutian Tang, Haoyu Wang, Xian Zhan et al.

Being able to automatically detect the performance issues in apps can significantly improve apps' quality as well as having a positive influence on user satisfaction. Application Performance Management (APM) libraries are used to locate the apps' performance bottleneck, monitor their behaviors at runtime, and identify potential security risks. Although app developers have been exploiting application performance management (APM) tools to capture these potential performance issues, most of them do not fully understand the internals of these APM tools and the effect on their apps. To fill this gap, in this paper, we conduct the first systematic study on APMs for apps by scrutinizing 25 widely-used APMs for Android apps and develop a framework named APMHunter for exploring the usage of APMs in Android apps. Using APMHunter, we conduct a large-scale empirical study on 500,000 Android apps to explore the usage patterns of APMs and discover the potential misuses of APMs. We obtain two major findings: 1) some APMs still employ deprecated permissions and approaches, which makes APMs fail to perform as expected; 2) inappropriate use of APMs can cause privacy leaks. Thus, our study suggests that both APM vendors and developers should design and use APMs scrupulously.