Long Cheng

CR
h-index30
40papers
939citations
Novelty50%
AI Score57

40 Papers

ROJun 4
Ensuring Interaction Safety in Multitask Exoskeleton Control: A Simulation-Trained Variable Impedance Framework

Muyuan Ma, Houcheng Li, Haotian Zhai et al.

Wearable exoskeletons can augment human phys ical capabilities during complex activities. However, ensuring adaptation across diverse tasks while guaranteeing interaction safety remains a critical challenge. To address this, a simulation trained variable impedance control approach with stability guarantees is proposed. First, a simulation-based human exoskeleton motion data generation pipeline is established, utilizing Proximal Policy Optimization (PPO) to synthesize human muscle activations while the exoskeleton provides direct compensation for human biological joint torques. Subsequently, the generated dataset is used to train a dual modality policy that fuses semantic instructions with proprioceptive history, enabling the prediction of reference trajectories and variable impedance gains for nine different motion tasks. To guarantee safety, the network outputs are constrained by a stability criterion derived from Lyapunov stability theory, which bounds stiffness variations to ensure the asymptotic stability of the coupled system. Experimental results indicate that the proposed framework reduces metabolic cost in real-world scenarios com pared with standard baseline methods. These findings suggest the feasibility of the proposed framework for safe, multitask exoskeleton control.

ROJun 1
Motion Planning in Dynamic Environments: A Survey from Classical to Modern Methods

Zongyuan Shen, Yaming Ou, Shalabh Gupta et al.

Motion planning in dynamic environments requires robots to continuously adapt their paths in response to environmental changes for safe and uninterrupted navigation. While many surveys have reviewed planning in static settings, systematic reviews focused on dynamic environments remain limited. This paper presents a comprehensive survey of 138 works, primarily published between 2015 and 2025, spanning both classical and learning-based approaches. The motion planning methods are grouped into five categories based on the concepts of sampling, graph search, model predictive control, learning, and additional classical local planning approaches, including velocity obstacles, potential fields and dynamic windows. The learning techniques include supervised learning and reinforcement learning. We also discuss the role of dynamic perception in motion planning, covering techniques for detecting and modeling moving obstacles using cameras, LiDAR, and event-based sensors. The survey analyzes the principles, strengths, and limitations of each method, with particular attention to challenges unique to dynamic environments, such as prediction uncertainty, human-robot interaction, and the freezing robot problem. The survey provides researchers with a structured understanding of motion planning methods in dynamic environments.

CLMay 21, 2025
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

Tencent Hunyuan Team, Ao Liu, Botong Zhou et al. · tencent-ai

As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid responses for simple queries and deep "thinking" modes for complex problems, optimizing computational resources. Architecturally, this 56B activated (560B total) parameter model employs 128 layers (Mamba2, Attention, FFN) with an innovative AMF/MF block pattern. Faster Mamba2 ensures linear complexity, Grouped-Query Attention minimizes KV cache, and FFNs use an MoE structure. Pre-trained on 16T high-quality tokens, it supports a 256K context length and is the first industry-deployed large-scale Mamba model. Our comprehensive post-training strategy enhances capabilities via Supervised Fine-Tuning (3M instructions), a novel Adaptive Long-short CoT Fusion method, Multi-round Deliberation Learning for iterative improvement, and a two-stage Large-scale Reinforcement Learning process targeting STEM and general instruction-following. Evaluations show strong performance: overall top 7 rank on LMSYS Chatbot Arena with a score of 1356, outperforming leading models like Gemini-2.0-Flash-001 (1352) and o4-mini-2025-04-16 (1345). TurboS also achieves an average of 77.9% across 23 automated benchmarks. Hunyuan-TurboS balances high performance and efficiency, offering substantial capabilities at lower inference costs than many reasoning models, establishing a new paradigm for efficient large-scale pre-trained models.

CLNov 2, 2022
Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model

Mingqi Li, Fei Ding, Dan Zhang et al.

Pre-trained multilingual language models play an important role in cross-lingual natural language understanding tasks. However, existing methods did not focus on learning the semantic structure of representation, and thus could not optimize their performance. In this paper, we propose Multi-level Multilingual Knowledge Distillation (MMKD), a novel method for improving multilingual language models. Specifically, we employ a teacher-student framework to adopt rich semantic representation knowledge in English BERT. We propose token-, word-, sentence-, and structure-level alignment objectives to encourage multiple levels of consistency between source-target pairs and correlation similarity between teacher and student models. We conduct experiments on cross-lingual evaluation benchmarks including XNLI, PAWS-X, and XQuAD. Experimental results show that MMKD outperforms other baseline models of similar size on XNLI and XQuAD and obtains comparable performance on PAWS-X. Especially, MMKD obtains significant performance gains on low-resource languages.

LGOct 12, 2022
Statistical Modeling of Soft Error Influence on Neural Networks

Haitong Huang, Xinghua Xue, Cheng Liu et al.

Soft errors in large VLSI circuits pose dramatic influence on computing- and memory-intensive neural network (NN) processing. Understanding the influence of soft errors on NNs is critical to protect against soft errors for reliable NN processing. Prior work mainly rely on fault simulation to analyze the influence of soft errors on NN processing. They are accurate but usually specific to limited configurations of errors and NN models due to the prohibitively slow simulation speed especially for large NN models and datasets. With the observation that the influence of soft errors propagates across a large number of neurons and accumulates as well, we propose to characterize the soft error induced data disturbance on each neuron with normal distribution model according to central limit theorem and develop a series of statistical models to analyze the behavior of NN models under soft errors in general. The statistical models reveal not only the correlation between soft errors and NN model accuracy, but also how NN parameters such as quantization and architecture affect the reliability of NNs. The proposed models are compared with fault simulation and verified comprehensively. In addition, we observe that the statistical models that characterize the soft error influence can also be utilized to predict fault simulation results in many cases and we explore the use of the proposed statistical models to accelerate fault simulations of NNs. According to our experiments, the accelerated fault simulation shows almost two orders of magnitude speedup with negligible simulation accuracy loss over the baseline fault simulations.

DCApr 24
Guess-Verify-Refine: Data-Aware Top-K for Sparse-Attention Decoding on Blackwell via Temporal Correlation

Long Cheng, Ritchie Zhao, Timmy Liu et al.

Sparse-attention decoders rely on exact Top-K selection to choose the most important key-value entries for each query token. In long-context LLM serving, this Top-K stage runs once per decode query and becomes a meaningful latency bottleneck even when the indexer and attention kernels are already highly optimized. We present \textbf{Guess-Verify-Refine (GVR)}, a data-aware exact Top-K algorithm for sparse-attention decoding on NVIDIA Blackwell. GVR exploits temporal correlation across consecutive decode steps: it uses the previous step's Top-K as a prediction signal, computes pre-indexed statistics, narrows to a valid threshold by secant-style counting in 1-2 global passes, verifies candidates with a ballot-free collector, and finishes exact selection in shared memory. We connect this behavior to the Toeplitz / RoPE structure of DeepSeek Sparse Attention (DSA) indexer scores and validate the design on real DeepSeek-V3.2 workloads integrated into TensorRT-LLM. GVR achieves an average \textbf{1.88x} single-operator speedup over the production radix-select kernel, with up to \textbf{2.42x} per layer per step, while preserving bit-exact Top-K outputs. In controlled TEP8 min-latency deployment, it improves end-to-end TPOT by up to \textbf{7.52%} at 100K context, with larger gains at longer contexts and smaller but still positive gains under speculative decoding. While implemented and validated in the current TensorRT-LLM DSA stack on Blackwell, the same principle may extend to sparse-attention decoders whose decode-phase Top-K exhibits temporal stability.

ARJul 17, 2024Code
IICPilot: An Intelligent Integrated Circuit Backend Design Framework Using Open EDA

Zesong Jiang, Qing Zhang, Cheng Liu et al.

Open-source EDA tools are rapidly advancing, fostering collaboration, innovation, and knowledge sharing within the EDA community. However, the growing complexity of these tools, characterized by numerous design parameters and heuristics, poses a significant barrier to their widespread adoption. This complexity is particularly pronounced in integrated circuit (IC) backend designs, which place substantial demands on engineers' expertise in EDA tools. To tackle this challenge, we introduce IICPilot, an intelligent IC backend design system based on LLM technology. IICPilot automates various backend design procedures, including script generation, EDA tool invocation, design space exploration of EDA parameters, container-based computing resource allocation, and exception management. By automating these tasks, IICPilot significantly lowers the barrier to entry for open-source EDA tools. Specifically, IICPilot utilizes LangChain's multi-agent framework to efficiently handle distinct design tasks, enabling flexible enhancements independently. Moreover, IICPilot separates the backend design workflow from specific open-source EDA tools through a unified EDA calling interface. This approach allows seamless integration with different open-source EDA tools like OpenROAD and iEDA, streamlining the backend design and optimization across the EDA tools.

LGApr 2, 2022
Revealing the CO2 emission reduction of ridesplitting and its determinants based on real-world data

Wenxiang Li, Yuanyuan Li, Ziyuan Pu et al.

Ridesplitting, which is a form of pooled ridesourcing service, has great potential to alleviate the negative impacts of ridesourcing on the environment. However, most existing studies only explored its theoretical environmental benefits based on optimization models and simulations. By contrast, this study aims to reveal the real-world emission reduction of ridesplitting and its determinants based on the observed data of ridesourcing in Chengdu, China. Integrating the trip data with the COPERT model, this study calculates the CO2 emissions of shared rides (ridesplitting) and their substituted single rides (regular ridesourcing) to estimate the CO2 emission reduction of each ridesplitting trip. The results show that not all ridesplitting trips reduce emissions from ridesourcing in the real world. The CO2 emission reduction rate of ridesplitting varies from trip to trip, averaging at 43.15g/km. Then, interpretable machine learning models, gradient boosting machines, are applied to explore the relationship between the CO2 emission reduction rate of ridesplitting and its determinants. Based on the SHapley Additive exPlanations (SHAP) method, the overlap rate and detour rate of shared rides are identified to be the most important factors that determine the CO2 emission reduction rate of ridesplitting. Increasing the overlap rate, the number of shared rides, average speed, and ride distance ratio while decreasing the detour rate, actual trip distance, and ride distance gap can increase the CO2 emission reduction rate of ridesplitting. In addition, nonlinear effects and interactions of the determinants are examined through the partial dependence plots. To sum up, this study provides a scientific method for the government and ridesourcing companies to better assess and optimize the environmental benefits of ridesplitting.

ROJun 20, 2023
Learning Variable Impedance Skills from Demonstrations with Passivity Guarantee

Yu Zhang, Long Cheng, Xiuze Xia et al.

Robots are increasingly being deployed not only in workplaces but also in households. Effectively execute of manipulation tasks by robots relies on variable impedance control with contact forces. Furthermore, robots should possess adaptive capabilities to handle the considerable variations exhibited by different robotic tasks in dynamic environments, which can be obtained through human demonstrations. This paper presents a learning-from-demonstration framework that integrates force sensing and motion information to facilitate variable impedance control. The proposed approach involves the estimation of full stiffness matrices from human demonstrations, which are then combined with sensed forces and motion information to create a model using the non-parametric method. This model allows the robot to replicate the demonstrated task while also responding appropriately to new task conditions through the use of the state-dependent stiffness profile. Additionally, a novel tank based variable impedance control approach is proposed to ensure passivity by using the learned stiffness. The proposed approach was evaluated using two virtual variable stiffness systems. The first evaluation demonstrates that the stiffness estimated approach exhibits superior robustness compared to traditional methods when tested on manual datasets, and the second evaluation illustrates that the novel tank based approach is more easily implementable compared to traditional variable impedance control approaches.

SDMay 18
Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation

Yizhu Wen, Shuhao Zhang, Nan Zhang et al.

Retrieval-augmented text-to-music (TTM) systems augment underspecified user prompts using captions retrieved from a music caption dataset. This design introduces an integrity dependency on the music knowledge database. We show that an attacker can poison the database by injecting a small number of crafted music captions, causing the system to retrieve malicious captions that bias prompt augmentation and steer generation away from the user's intended function, without modifying the user prompt, retriever, or generator. To achieve the music caption poisoning attack, we propose a dual-layer caption poisoning strategy that preserves high-level retrieval anchors while injecting low-level acoustic descriptors to steer prompt augmentation and downstream music generation toward an attacker-chosen target intent. In a MusicCaps knowledge database, CLAP retriever, and MusicGen pipeline, poisoned generations move substantially closer to the attacker's target, while remaining comparably aligned with the original user query. These results expose a practical integrity risk for retrieval-augmented creative AI systems. Our demo can be found at: https://yizhu-wen.github.io/Mental-Damage/

ROAug 8, 2024
Enhanced Prediction of Multi-Agent Trajectories via Control Inference and State-Space Dynamics

Yu Zhang, Yongxiang Zou, Haoyu Zhang et al.

In the field of autonomous systems, accurately predicting the trajectories of nearby vehicles and pedestrians is crucial for ensuring both safety and operational efficiency. This paper introduces a novel methodology for trajectory forecasting based on state-space dynamic system modeling, which endows agents with models that have tangible physical implications. To enhance the precision of state estimations within the dynamic system, the paper also presents a novel modeling technique for control variables. This technique utilizes a newly introduced model, termed "Mixed Mamba," to derive initial control states, thereby improving the predictive accuracy of these variables. Moverover, the proposed approach ingeniously integrates graph neural networks with state-space models, effectively capturing the complexities of multi-agent interactions. This combination provides a robust and scalable framework for forecasting multi-agent trajectories across a range of scenarios. Comprehensive evaluations demonstrate that this model outperforms several established benchmarks across various metrics and datasets, highlighting its significant potential to advance trajectory forecasting in autonomous systems.

IVSep 30, 2024
Adaptive Data Transport Mechanism for UAV Surveillance Missions in Lossy Environments

Niloufar Mehrabi, Sayed Pedram Haeri Boroujeni, Jenna Hofseth et al.

Unmanned Aerial Vehicles (UAVs) play an increasingly critical role in Intelligence, Surveillance, and Reconnaissance (ISR) missions such as border patrolling and criminal detection, thanks to their ability to access remote areas and transmit real-time imagery to processing servers. However, UAVs are highly constrained by payload size, power limits, and communication bandwidth, necessitating the development of highly selective and efficient data transmission strategies. This has driven the development of various compression and optimal transmission technologies for UAVs. Nevertheless, most methods strive to preserve maximal information in transferred video frames, missing the fact that only certain parts of images/video frames might offer meaningful contributions to the ultimate mission objectives in the ISR scenarios involving moving object detection and tracking (OD/OT). This paper adopts a different perspective, and offers an alternative AI-driven scheduling policy that prioritizes selecting regions of the image that significantly contributes to the mission objective. The key idea is tiling the image into small patches and developing a deep reinforcement learning (DRL) framework that assigns higher transmission probabilities to patches that present higher overlaps with the detected object of interest, while penalizing sharp transitions over consecutive frames to promote smooth scheduling shifts. Although we used Yolov-8 object detection and UDP transmission protocols as a benchmark testing scenario the idea is general and applicable to different transmission protocols and OD/OT methods. To further boost the system's performance and avoid OD errors for cluttered image patches, we integrate it with interframe interpolations.

ARJul 17, 2024
MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs

Junfeng Gong, Cheng Liu, Long Cheng et al.

Mixed-precision neural network (MPNN) that utilizes just enough data width for the neural network processing is an effective approach to meet the stringent resources constraints including memory and computing of MCUs. Nevertheless, there is still a lack of sub-byte and mixed-precision SIMD operations in MCU-class ISA and the limited computing capability of MCUs remains underutilized, which further aggravates the computing bound encountered in neural network processing. As a result, the benefits of MPNNs cannot be fully unleashed. In this work, we propose to pack multiple low-bitwidth arithmetic operations within a single instruction multiple data (SIMD) instructions in typical MCUs, and then develop an efficient convolution operator by exploring both the data parallelism and computing parallelism in convolution along with the proposed SIMD packing. Finally, we further leverage Neural Architecture Search (NAS) to build a HW/SW co-designed MPNN design framework, namely MCU-MixQ. This framework can optimize both the MPNN quantization and MPNN implementation efficiency, striking an optimized balance between neural network performance and accuracy. According to our experiment results, MCU-MixQ achieves 2.1$\times$ and 1.4$\times$ speedup over CMix-NN and MCUNet respectively under the same resource constraints.

CVMay 15, 2025Code
PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

Long Cheng, Jiafei Duan, Yi Ru Wang et al. · uw

Pointing serves as a fundamental and intuitive mechanism for grounding language within visual contexts, with applications spanning robotics, assistive technologies, and interactive AI systems. While recent multimodal models have started to support pointing capabilities, existing benchmarks typically focus only on referential object localization tasks. We introduce PointArena, a comprehensive platform for evaluating multimodal pointing across diverse reasoning scenarios. PointArena comprises three components: (1) Point-Bench, a curated dataset containing approximately 1,000 pointing tasks across five reasoning categories; (2) Point-Battle, an interactive, web-based arena facilitating blind, pairwise model comparisons, which has already gathered over 4,500 anonymized votes; and (3) Point-Act, a real-world robotic manipulation system allowing users to directly evaluate multimodal model pointing capabilities in practical settings. We conducted extensive evaluations of both state-of-the-art open-source and proprietary multimodal models. Results indicate that Molmo-72B consistently outperforms other models, though proprietary models increasingly demonstrate comparable performance. Additionally, we find that supervised training specifically targeting pointing tasks significantly enhances model performance. Across our multi-stage evaluation pipeline, we also observe strong correlations, underscoring the critical role of precise pointing capabilities in enabling multimodal models to effectively bridge abstract reasoning with concrete, real-world actions. Project page: https://pointarena.github.io/

CRApr 27
Real-World Evaluation of Protocol-Compliant Denial-of-Service Attacks on C-V2X-based Forward Collision Warning Systems

Jean Michel Tine, Mohammed Aldeen, Abyad Enan et al.

Cellular Vehicle-to-Everything (C-V2X) technology enables low-latency, reliable communications essential for safety applications such as a Forward Collision Warning (FCW) system. C-V2X deployments operate under strict protocol compliance with the 3rd Generation Partnership Project (3GPP) and the Society of Automotive Engineers Standard (SAE) J2735 specifications to ensure interoperability. This paper presents a real-world testbed evaluation of protocol-compliant Denial-of-Service (DoS) attacks using User Datagram Protocol (UDP) flooding and oversized Basic Safety Message (BSM) attacks that 7 exploit transport- and application-layer vulnerabilities in C-V2X. The attacks presented in this study transmit valid messages over standard PC5 sidelinks, fully adhering to 3GPP and SAE J2735 specifications, but at abnormally high rates and with oversized payloads that overload the receiver resources without breaching any protocol rules such as IEEE 1609. Using a real-world connected vehicle 11 testbed with commercially available On-Board Units (OBUs), we demonstrate that high-rate UDP flooding and oversized payload of BSM flooding can severely degrade FCW performance. Results show that UDP flooding alone reduces packet delivery ratio by up to 87% and increases latency to over 400ms, while oversized BSM floods overload receiver processing resources, delaying or completely suppressing FCW alerts. When UDP and BSM attacks are executed simultaneously, they cause near-total communication failure, preventing FCW warnings entirely. These findings reveal that protocol-compliant communications do not necessarily guarantee safe or reliable operation of C-V2X-based safety applications.

CLMar 23
Towards Automated Community Notes Generation with Large Vision Language Models for Combating Contextual Deception

Jin Ma, Jingwen Yan, Mohammed Aldeen et al.

Community Notes have emerged as an effective crowd-sourced mechanism for combating online deception on social media platforms. However, its reliance on human contributors limits both the timeliness and scalability. In this work, we study the automated Community Notes generation method for image-based contextual deception, where an authentic image is paired with misleading context (e.g., time, entity, and event). Unlike prior work that primarily focuses on deception detection (i.e., judging whether a post is true or false in a binary manner), Community Notes-style systems need to generate concise and grounded notes that help users recover the missing or corrected context. This problem remains underexplored due to three reasons: (i) datasets that support the research are scarce; (ii) methods must handle the dynamic nature of contextual deception; (iii) evaluation is difficult because standard metrics do not capture whether notes actually improve user understanding. To address these gaps, we curate a real-world dataset, XCheck, comprising X posts with associated Community Notes and external contexts. We further propose the Automated Context-Corrective Note generation method, named ACCNote, which is a retrieval-augmented, multi-agent collaboration framework built on large vision-language models. Finally, we introduce a new evaluation metric, Context Helpfulness Score (CHS), that aligns with user study outcomes rather than relying on lexical overlap. Experiments on our XCheck dataset show that the proposed ACCNote improves both deception detection and note generation performance over baselines, and exceeds a commercial tool GPT5-mini. Together, our dataset, method, and metric advance practical automated generation of context-corrective notes toward more responsible online social networks.

AIApr 5Code
Solar-VLM: Multimodal Vision-Language Models for Augmented Solar Power Forecasting

Hang Fan, Haoran Pei, Runze Liang et al.

Photovoltaic (PV) power forecasting plays a critical role in power system dispatch and market participation. Because PV generation is highly sensitive to weather conditions and cloud motion, accurate forecasting requires effective modeling of complex spatiotemporal dependencies across multiple information sources. Although recent studies have advanced AI-based forecasting methods, most fail to fuse temporal observations, satellite imagery, and textual weather information in a unified framework. This paper proposes Solar-VLM, a large-language-model-driven framework for multimodal PV power forecasting. First, modality-specific encoders are developed to extract complementary features from heterogeneous inputs. The time-series encoder adopts a patch-based design to capture temporal patterns from multivariate observations at each site. The visual encoder, built upon a Qwen-based vision backbone, extracts cloud-cover information from satellite images. The text encoder distills historical weather characteristics from textual descriptions. Second, to capture spatial dependencies across geographically distributed PV stations, a cross-site feature fusion mechanism is introduced. Specifically, a Graph Learner models inter-station correlations through a graph attention network constructed over a K-nearest-neighbor (KNN) graph, while a cross-site attention module further facilitates adaptive information exchange among sites. Finally, experiments conducted on data from eight PV stations in a northern province of China demonstrate the effectiveness of the proposed framework. Our proposed model is publicly available at https://github.com/rhp413/Solar-VLM.

CVOct 13, 2025Code
AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model

Zhiwei Jin, Xiaohui Song, Nan Wang et al.

In recent years, while cloud-based MLLMs such as QwenVL, InternVL, GPT-4o, Gemini, and Claude Sonnet have demonstrated outstanding performance with enormous model sizes reaching hundreds of billions of parameters, they significantly surpass the limitations in memory, power consumption, and computing capacity of edge devices such as mobile phones. This paper introduces AndesVL, a suite of mobile-side MLLMs with 0.6B to 4B parameters based on Qwen3's LLM and various visual encoders. We comprehensively outline the model architectures, training pipeline, and training data of AndesVL, which achieves first-tier performance across a wide range of open-source benchmarks, including fields such as text-rich image understanding, reasoning and math, multi-image comprehension, general VQA, hallucination mitigation, multilingual understanding, and GUI-related tasks when compared with state-of-the-art models of a similar scale. Furthermore, we introduce a 1+N LoRA architecture alongside a Quantization-Aware LoRA Fine-Tuning (QALFT) framework to facilitate efficient task adaptation and model compression during mobile-side deployment of AndesVL. Moreover, utilizing our cache eviction algorithm -- OKV -- along with customized speculative decoding and compression strategies, we achieve a 6.7x peak decoding speedup ratio, up to 30.9% memory reduction, and 1.8 bits-per-weight when deploying AndesVL-4B on MediaTek Dimensity 9500 chips. We release all models on https://huggingface.co/OPPOer.

AIJul 6, 2025Code
Anomalous Decision Discovery using Inverse Reinforcement Learning

Ashish Bastola, Mert D. Pesé, Long Cheng et al.

Anomaly detection plays a critical role in Autonomous Vehicles (AVs) by identifying unusual behaviors through perception systems that could compromise safety and lead to hazardous situations. Current approaches, which often rely on predefined thresholds or supervised learning paradigms, exhibit reduced efficacy when confronted with unseen scenarios, sensor noise, and occlusions, leading to potential safety-critical failures. Moreover, supervised methods require large annotated datasets, limiting their real-world feasibility. To address these gaps, we propose an anomaly detection framework based on Inverse Reinforcement Learning (IRL) to infer latent driving intentions from sequential perception data, thus enabling robust identification. Specifically, we present Trajectory-Reward Guided Adaptive Pre-training (TRAP), a novel IRL framework for anomaly detection, to address two critical limitations of existing methods: noise robustness and generalization to unseen scenarios. Our core innovation is implicitly learning temporal credit assignments via reward and worst-case supervision. We leverage pre-training with variable-horizon sampling to maximize time-to-consequence, resulting in early detection of behavior deviation. Experiments on 14,000+ simulated trajectories demonstrate state-of-the-art performance, achieving 0.90 AUC and 82.2\% F1-score - outperforming similarly trained supervised and unsupervised baselines by 39\% on Recall and 12\% on F1-score, respectively. Similar performance is achieved while exhibiting robustness to various noise types and generalization to unseen anomaly types. Our code will be available at: https://github.com/abastola0/TRAP.git

CRAug 13, 2021Code
Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection

Shouguo Yang, Long Cheng, Yicheng Zeng et al.

Binary code similarity detection is a fundamental technique for many security applications such as vulnerability search, patch analysis, and malware detection. There is an increasing need to detect similar code for vulnerability search across architectures with the increase of critical vulnerabilities in IoT devices. The variety of IoT hardware architectures and software platforms requires to capture semantic equivalence of code fragments in the similarity detection. However, existing approaches are insufficient in capturing the semantic similarity. We notice that the abstract syntax tree (AST) of a function contains rich semantic information. Inspired by successful applications of natural language processing technologies in sentence semantic understanding, we propose a deep learning-based AST-encoding method, named ASTERIA, to measure the semantic equivalence of functions in different platforms. Our method leverages the Tree-LSTM network to learn the semantic representation of a function from its AST. Then the similarity detection can be conducted efficiently and accurately by measuring the similarity between two representation vectors. We have implemented an open-source prototype of ASTERIA. The Tree-LSTM model is trained on a dataset with 1,022,616 function pairs and evaluated on a dataset with 95,078 function pairs. Evaluation results show that our method outperforms the AST-based tool Diaphora and the-state-of-art method Gemini by large margins with respect to the binary similarity detection. And our method is several orders of magnitude faster than Diaphora and Gemini for the similarity calculation. In the application of vulnerability search, our tool successfully identified 75 vulnerable functions in 5,979 IoT firmware images.

ROMay 6
From Reach to Insert: Tactile-Augmented Precision Assembly under Sub-Millimeter Tolerances

Xinpan Meng, Siyao Huang, JingPu Yang et al.

High-precision assembly frequently involves tight-tolerance insertions, where even slight pose errors can cause jamming or excessive interaction forces, making robust and safe insertion policies difficult to obtain. This paper proposes a tactile-augmented two-stage method that combines Imitation Learning (IL) and Reinforcement Learning (RL) for precision insertion tasks. In the first stage, IL learns a reaching policy with position generalization that grasps the peg and brings it to the vicinity of the target region. In the second stage, RL executes the insertion and enables recovery from failures during contact-rich interactions. To better exploit tactile feedback, we introduce tactile group sampling to increase coverage of critical contact segments during training, and design a tactile critic to more accurately evaluate policy values, improving insertion performance while maintaining low contact forces. We conduct systematic experiments across five hole geometries and three clearance settings. Results show that our method substantially improves insertion performance across all settings; under the most challenging 0.05\,mm clearance, it achieves a 67\% success rate while keeping contact forces low, reducing the maximum interaction force by 60\% and torque by 44\%, thereby validating both effectiveness and safety for precision assembly.

CLDec 22, 2023
Moderating New Waves of Online Hate with Chain-of-Thought Reasoning in Large Language Models

Nishant Vishwamitra, Keyan Guo, Farhan Tajwar Romit et al.

Online hate is an escalating problem that negatively impacts the lives of Internet users, and is also subject to rapid changes due to evolving events, resulting in new waves of online hate that pose a critical threat. Detecting and mitigating these new waves present two key challenges: it demands reasoning-based complex decision-making to determine the presence of hateful content, and the limited availability of training samples hinders updating the detection model. To address this critical issue, we present a novel framework called HATEGUARD for effectively moderating new waves of online hate. HATEGUARD employs a reasoning-based approach that leverages the recently introduced chain-of-thought (CoT) prompting technique, harnessing the capabilities of large language models (LLMs). HATEGUARD further achieves prompt-based zero-shot detection by automatically generating and updating detection prompts with new derogatory terms and targets in new wave samples to effectively address new waves of online hate. To demonstrate the effectiveness of our approach, we compile a new dataset consisting of tweets related to three recently witnessed new waves: the 2022 Russian invasion of Ukraine, the 2021 insurrection of the US Capitol, and the COVID-19 pandemic. Our studies reveal crucial longitudinal patterns in these new waves concerning the evolution of events and the pressing need for techniques to rapidly update existing moderation tools to counteract them. Comparative evaluations against state-of-the-art tools illustrate the superiority of our framework, showcasing a substantial 22.22% to 83.33% improvement in detecting the three new waves of online hate. Our work highlights the severe threat posed by the emergence of new waves of online hate and represents a paradigm shift in addressing this threat practically.

DCJan 2, 2025
Deep Reinforcement Learning for Job Scheduling and Resource Management in Cloud Computing: An Algorithm-Level Review

Yan Gu, Zhaoze Liu, Shuhong Dai et al.

Cloud computing has revolutionized the provisioning of computing resources, offering scalable, flexible, and on-demand services to meet the diverse requirements of modern applications. At the heart of efficient cloud operations are job scheduling and resource management, which are critical for optimizing system performance and ensuring timely and cost-effective service delivery. However, the dynamic and heterogeneous nature of cloud environments presents significant challenges for these tasks, as workloads and resource availability can fluctuate unpredictably. Traditional approaches, including heuristic and meta-heuristic algorithms, often struggle to adapt to these real-time changes due to their reliance on static models or predefined rules. Deep Reinforcement Learning (DRL) has emerged as a promising solution to these challenges by enabling systems to learn and adapt policies based on continuous observations of the environment, facilitating intelligent and responsive decision-making. This survey provides a comprehensive review of DRL-based algorithms for job scheduling and resource management in cloud computing, analyzing their methodologies, performance metrics, and practical applications. We also highlight emerging trends and future research directions, offering valuable insights into leveraging DRL to advance both job scheduling and resource management in cloud computing.

AROct 16, 2024
COMET: Towards Partical W4A4KV4 LLMs Serving

Lian Liu, Haimeng Ren, Long Cheng et al.

Quantization is a widely-used compression technology to reduce the overhead of serving large language models (LLMs) on terminal devices and in cloud data centers. However, prevalent quantization methods, such as 8-bit weight-activation or 4-bit weight-only quantization, achieve limited performance improvements due to poor support for low-precision (e.g., 4-bit) activation. This work, for the first time, realizes practical W4A4KV4 serving for LLMs, fully utilizing the INT4 tensor cores on modern GPUs and reducing the memory bottleneck caused by the KV cache. Specifically, we propose a novel fine-grained mixed-precision quantization algorithm (FMPQ) that compresses most activations into 4-bit with negligible accuracy loss. To support mixed-precision matrix multiplication for W4A4 and W4A8, we develop a highly optimized W4Ax kernel. Our approach introduces a novel mixed-precision data layout to facilitate access and fast dequantization for activation and weight tensors, utilizing the GPU's software pipeline to hide the overhead of data loading and conversion. Additionally, we propose fine-grained streaming multiprocessor (SM) scheduling to achieve load balance across different SMs. We integrate the optimized W4Ax kernel into our inference framework, COMET, and provide efficient management to support popular LLMs such as LLaMA-3-70B. Extensive evaluations demonstrate that, when running LLaMA family models on a single A100-80G-SMX4, COMET achieves a kernel-level speedup of \textbf{$2.88\times$} over cuBLAS and a \textbf{$2.02 \times$} throughput improvement compared to TensorRT-LLM from an end-to-end framework perspective.

MAApr 11, 2025
Graph Based Deep Reinforcement Learning Aided by Transformers for Multi-Agent Cooperation

Michael Elrod, Niloufar Mehrabi, Rahul Amin et al.

Mission planning for a fleet of cooperative autonomous drones in applications that involve serving distributed target points, such as disaster response, environmental monitoring, and surveillance, is challenging, especially under partial observability, limited communication range, and uncertain environments. Traditional path-planning algorithms struggle in these scenarios, particularly when prior information is not available. To address these challenges, we propose a novel framework that integrates Graph Neural Networks (GNNs), Deep Reinforcement Learning (DRL), and transformer-based mechanisms for enhanced multi-agent coordination and collective task execution. Our approach leverages GNNs to model agent-agent and agent-goal interactions through adaptive graph construction, enabling efficient information aggregation and decision-making under constrained communication. A transformer-based message-passing mechanism, augmented with edge-feature-enhanced attention, captures complex interaction patterns, while a Double Deep Q-Network (Double DQN) with prioritized experience replay optimizes agent policies in partially observable environments. This integration is carefully designed to address specific requirements of multi-agent navigation, such as scalability, adaptability, and efficient task execution. Experimental results demonstrate superior performance, with 90% service provisioning and 100% grid coverage (node discovery), while reducing the average steps per episode to 200, compared to 600 for benchmark methods such as particle swarm optimization (PSO), greedy algorithms and DQN.

SEMar 25, 2025
VecTrans: Enhancing Compiler Auto-Vectorization through LLM-Assisted Code Transformations

Zhongchun Zheng, Kan Wu, Long Cheng et al.

Auto-vectorization is a fundamental optimization for modern compilers to exploit SIMD parallelism. However, state-of-the-art approaches still struggle to handle intricate code patterns, often requiring manual hints or domain-specific expertise. Large language models (LLMs), with their ability to capture intricate patterns, provide a promising solution, yet their effective application in compiler optimizations remains an open challenge due to issues such as hallucinations and a lack of domain-specific reasoning. In this paper, we present VecTrans, a novel framework that leverages LLMs to enhance compiler-based code vectorization. VecTrans first employs compiler analysis to identify potentially vectorizable code regions. It then utilizes an LLM to refactor these regions into patterns that are more amenable to the compilers auto-vectorization. To ensure semantic correctness, VecTrans further integrates a hybrid validation mechanism at the intermediate representation (IR) level. With the above efforts, VecTrans combines the adaptability of LLMs with the precision of compiler vectorization, thereby effectively opening up the vectorization opportunities. experimental results show that among all TSVC functions unvectorizable by GCC, ICC, Clang, and BiSheng Compiler, VecTrans achieves an geomean speedup of 1.77x and successfully vectorizes 24 of 51 test cases. This marks a significant advancement over state-of-the-art approaches while maintaining a cost efficiency of $0.012 per function optimization for LLM API usage.

CYMay 13, 2024
AI-Cybersecurity Education Through Designing AI-based Cyberharassment Detection Lab

Ebuka Okpala, Nishant Vishwamitra, Keyan Guo et al.

Cyberharassment is a critical, socially relevant cybersecurity problem because of the adverse effects it can have on targeted groups or individuals. While progress has been made in understanding cyber-harassment, its detection, attacks on artificial intelligence (AI) based cyberharassment systems, and the social problems in cyberharassment detectors, little has been done in designing experiential learning educational materials that engage students in this emerging social cybersecurity in the era of AI. Experiential learning opportunities are usually provided through capstone projects and engineering design courses in STEM programs such as computer science. While capstone projects are an excellent example of experiential learning, given the interdisciplinary nature of this emerging social cybersecurity problem, it can be challenging to use them to engage non-computing students without prior knowledge of AI. Because of this, we were motivated to develop a hands-on lab platform that provided experiential learning experiences to non-computing students with little or no background knowledge in AI and discussed the lessons learned in developing this lab. In this lab used by social science students at North Carolina A&T State University across two semesters (spring and fall) in 2022, students are given a detailed lab manual and are to complete a set of well-detailed tasks. Through this process, students learn AI concepts and the application of AI for cyberharassment detection. Using pre- and post-surveys, we asked students to rate their knowledge or skills in AI and their understanding of the concepts learned. The results revealed that the students moderately understood the concepts of AI and cyberharassment.

ROSep 13, 2025
ViSTR-GP: Online Cyberattack Detection via Vision-to-State Tensor Regression and Gaussian Processes in Automated Robotic Operations

Navid Aftabi, Philip Samaha, Jin Ma et al.

Industrial robotic systems are central to automating smart manufacturing operations. Connected and automated factories face growing cybersecurity risks that can potentially cause interruptions and damages to physical operations. Among these attacks, data-integrity attacks often involve sophisticated exploitation of vulnerabilities that enable an attacker to access and manipulate the operational data and are hence difficult to detect with only existing intrusion detection or model-based detection. This paper addresses the challenges in utilizing existing side-channels to detect data-integrity attacks in robotic manufacturing processes by developing an online detection framework, ViSTR-GP, that cross-checks encoder-reported measurements against a vision-based estimate from an overhead camera outside the controller's authority. In this framework, a one-time interactive segmentation initializes SAM-Track to generate per-frame masks. A low-rank tensor-regression surrogate maps each mask to measurements, while a matrix-variate Gaussian process models nominal residuals, capturing temporal structure and cross-joint correlations. A frame-wise test statistic derived from the predictive distribution provides an online detector with interpretable thresholds. We validate the framework on a real-world robotic testbed with synchronized video frame and encoder data, collecting multiple nominal cycles and constructing replay attack scenarios with graded end-effector deviations. Results on the testbed indicate that the proposed framework recovers joint angles accurately and detects data-integrity attacks earlier with more frequent alarms than all baselines. These improvements are most evident in the most subtle attacks. These results show that plants can detect data-integrity attacks by adding an independent physical channel, bypassing the controller's authority, without needing complex instrumentation.

CVSep 4, 2025
DisPatch: Disarming Adversarial Patches in Object Detection with Diffusion Models

Jin Ma, Mohammed Aldeen, Christopher Salas et al.

Object detection is fundamental to various real-world applications, such as security monitoring and surveillance video analysis. Despite their advancements, state-of-theart object detectors are still vulnerable to adversarial patch attacks, which can be easily applied to real-world objects to either conceal actual items or create non-existent ones, leading to severe consequences. Given the current diversity of adversarial patch attacks and potential unknown threats, an ideal defense method should be effective, generalizable, and robust against adaptive attacks. In this work, we introduce DISPATCH, the first diffusion-based defense framework for object detection. Unlike previous works that aim to "detect and remove" adversarial patches, DISPATCH adopts a "regenerate and rectify" strategy, leveraging generative models to disarm attack effects while preserving the integrity of the input image. Specifically, we utilize the in-distribution generative power of diffusion models to regenerate the entire image, aligning it with benign data. A rectification process is then employed to identify and replace adversarial regions with their regenerated benign counterparts. DISPATCH is attack-agnostic and requires no prior knowledge of the existing patches. Extensive experiments across multiple detectors and attacks demonstrate that DISPATCH consistently outperforms state-of-the-art defenses on both hiding attacks and creating attacks, achieving the best overall mAP.5 score of 89.3% on hiding attacks, and lowering the attack success rate to 24.8% on untargeted creating attacks. Moreover, it maintains strong robustness against adaptive attacks, making it a practical and reliable defense for object detection systems.

CVAug 4, 2025
Understanding the Risks of Asphalt Art on the Reliability of Surveillance Perception Systems

Jin Ma, Abyad Enan, Long Cheng et al.

Artistic crosswalks featuring asphalt art, introduced by different organizations in recent years, aim to enhance the visibility and safety of pedestrians. However, their visual complexity may interfere with surveillance systems that rely on vision-based object detection models. In this study, we investigate the impact of asphalt art on pedestrian detection performance of a pretrained vision-based object detection model. We construct realistic crosswalk scenarios by compositing various street art patterns into a fixed surveillance scene and evaluate the model's performance in detecting pedestrians on asphalt-arted crosswalks under both benign and adversarial conditions. A benign case refers to pedestrian crosswalks painted with existing normal asphalt art, whereas an adversarial case involves digitally crafted or altered asphalt art perpetrated by an attacker. Our results show that while simple, color-based designs have minimal effect, complex artistic patterns, particularly those with high visual salience, can significantly degrade pedestrian detection performance. Furthermore, we demonstrate that adversarially crafted asphalt art can be exploited to deliberately obscure real pedestrians or generate non-existent pedestrian detections. These findings highlight a potential vulnerability in urban vision-based pedestrian surveillance systems and underscore the importance of accounting for environmental visual variations when designing robust pedestrian perception models.

AIMay 19, 2025
Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities

Lili Zhang, Haomiaomiao Wang, Long Cheng et al.

As Large Language Models (LLMs) become increasingly integrated into real-world decision-making systems, understanding their behavioural vulnerabilities remains a critical challenge for AI safety and alignment. While existing evaluation metrics focus primarily on reasoning accuracy or factual correctness, they often overlook whether LLMs are robust to adversarial manipulation or capable of using adaptive strategy in dynamic environments. This paper introduces an adversarial evaluation framework designed to systematically stress-test the decision-making processes of LLMs under interactive and adversarial conditions. Drawing on methodologies from cognitive psychology and game theory, our framework probes how models respond in two canonical tasks: the two-armed bandit task and the Multi-Round Trust Task. These tasks capture key aspects of exploration-exploitation trade-offs, social cooperation, and strategic flexibility. We apply this framework to several state-of-the-art LLMs, including GPT-3.5, GPT-4, Gemini-1.5, and DeepSeek-V3, revealing model-specific susceptibilities to manipulation and rigidity in strategy adaptation. Our findings highlight distinct behavioral patterns across models and emphasize the importance of adaptability and fairness recognition for trustworthy AI deployment. Rather than offering a performance benchmark, this work proposes a methodology for diagnosing decision-making weaknesses in LLM-based agents, providing actionable insights for alignment and safety research.

LGFeb 26, 2025
Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis

Long Cheng, Qichen Liao, Fan Wu et al.

Attention calculation is extremely time-consuming for long-sequence inference tasks, such as text or image/video generation, in large models. To accelerate this process, we developed a low-precision, mathematically-equivalent algorithm called PASA, based on Flash Attention. PASA introduces two novel techniques: online pseudo-average shifting and global recovering. These techniques enable the use of half-precision computation throughout the Flash Attention process without incurring overflow instability or unacceptable numerical accuracy loss. This algorithm enhances performance on memory-restricted AI hardware architectures, such as the Ascend Neural-network Processing Unit(NPU), by reducing data movement and increasing computational FLOPs. The algorithm is validated using both designed random benchmarks and real large models. We find that the large bias and amplitude of attention input data are critical factors contributing to numerical overflow ($>65504$ for half precision) in two different categories of large models (Qwen2-7B language models and Stable-Video-Diffusion multi-modal models). Specifically, overflow arises due to the large bias in the sequence dimension and the resonance mechanism between the query and key in the head dimension of the Stable-Video-Diffusion models. The resonance mechanism is defined as phase coincidence or 180-degree phase shift between query and key matrices. It will remarkably amplify the element values of attention score matrix. This issue also applies to the Qwen models. Additionally, numerical accuracy is assessed through root mean square error (RMSE) and by comparing the final generated texts and videos to those produced using high-precision attention.

AIJan 31, 2022
CoTV: Cooperative Control for Traffic Light Signals and Connected Autonomous Vehicles using Deep Reinforcement Learning

Jiaying Guo, Long Cheng, Shen Wang

The target of reducing travel time only is insufficient to support the development of future smart transportation systems. To align with the United Nations Sustainable Development Goals (UN-SDG), a further reduction of fuel and emissions, improvements of traffic safety, and the ease of infrastructure deployment and maintenance should also be considered. Different from existing work focusing on the optimization of the control in either traffic light signal (to improve the intersection throughput), or vehicle speed (to stabilize the traffic), this paper presents a multi-agent Deep Reinforcement Learning (DRL) system called CoTV, which Cooperatively controls both Traffic light signals and Connected Autonomous Vehicles (CAV). Therefore, our CoTV can well balance the achievement of the reduction of travel time, fuel, and emissions. In the meantime, CoTV can also be easy to deploy by cooperating with only one CAV that is the nearest to the traffic light controller on each incoming road. This enables more efficient coordination between traffic light controllers and CAV, thus leading to the convergence of training CoTV under the large-scale multi-agent scenario that is traditionally difficult to converge. We give the detailed system design of CoTV and demonstrate its effectiveness in a simulation study using SUMO under various grid maps and realistic urban scenarios with mixed-autonomy traffic.

LGDec 22, 2021
Understanding and Measuring Robustness of Multimodal Learning

Nishant Vishwamitra, Hongxin Hu, Ziming Zhao et al.

The modern digital world is increasingly becoming multimodal. Although multimodal learning has recently revolutionized the state-of-the-art performance in multimodal tasks, relatively little is known about the robustness of multimodal learning in an adversarial setting. In this paper, we introduce a comprehensive measurement of the adversarial robustness of multimodal learning by focusing on the fusion of input modalities in multimodal models, via a framework called MUROAN (MUltimodal RObustness ANalyzer). We first present a unified view of multimodal models in MUROAN and identify the fusion mechanism of multimodal models as a key vulnerability. We then introduce a new type of multimodal adversarial attacks called decoupling attack in MUROAN that aims to compromise multimodal models by decoupling their fused modalities. We leverage the decoupling attack of MUROAN to measure several state-of-the-art multimodal models and find that the multimodal fusion mechanism in all these models is vulnerable to decoupling attacks. We especially demonstrate that, in the worst case, the decoupling attack of MUROAN achieves an attack success rate of 100% by decoupling just 1.16% of the input space. Finally, we show that traditional adversarial training is insufficient to improve the robustness of multimodal models with respect to decoupling attacks. We hope our findings encourage researchers to pursue improving the robustness of multimodal learning.

ARJul 7, 2021
R2F: A Remote Retraining Framework for AIoT Processors with Computing Errors

Dawen Xu, Meng He, Cheng Liu et al.

AIoT processors fabricated with newer technology nodes suffer rising soft errors due to the shrinking transistor sizes and lower power supply. Soft errors on the AIoT processors particularly the deep learning accelerators (DLAs) with massive computing may cause substantial computing errors. These computing errors are difficult to be captured by the conventional training on general purposed processors like CPUs and GPUs in a server. Applying the offline trained neural network models to the edge accelerators with errors directly may lead to considerable prediction accuracy loss. To address the problem, we propose a remote retraining framework (R2F) for remote AIoT processors with computing errors. It takes the remote AIoT processor with soft errors in the training loop such that the on-site computing errors can be learned with the application data on the server and the retrained models can be resilient to the soft errors. Meanwhile, we propose an optimized partial TMR strategy to enhance the retraining. According to our experiments, R2F enables elastic design trade-offs between the model accuracy and the performance penalty. The top-5 model accuracy can be improved by 1.93%-13.73% with 0%-200% performance penalty at high fault error rate. In addition, we notice that the retraining requires massive data transmission and even dominates the training time, and propose a sparse increment compression approach for the data transmission optimization, which reduces the retraining time by 38%-88% on average with negligible accuracy loss over a straightforward remote retraining.

CVSep 14, 2020
3D Object Detection and Tracking Based on Streaming Data

Xusen Guo, Jiangfeng Gu, Silu Guo et al.

Recent approaches for 3D object detection have made tremendous progresses due to the development of deep learning. However, previous researches are mostly based on individual frames, leading to limited exploitation of information between frames. In this paper, we attempt to leverage the temporal information in streaming data and explore 3D streaming based object detection as well as tracking. Toward this goal, we set up a dual-way network for 3D object detection based on keyframes, and then propagate predictions to non-key frames through a motion based interpolation algorithm guided by temporal information. Our framework is not only shown to have significant improvements on object detection compared with frame-by-frame paradigm, but also proven to produce competitive results on KITTI Object Tracking Benchmark, with 76.68% in MOTA and 81.65% in MOTP respectively.

CRJul 29, 2020
Measuring the Effectiveness of Privacy Policies for Voice Assistant Applications

Song Liao, Christin Wilson, Long Cheng et al.

Voice Assistants (VA) such as Amazon Alexa and Google Assistant are quickly and seamlessly integrating into people's daily lives. The increased reliance on VA services raises privacy concerns such as the leakage of private conversations and sensitive information. Privacy policies play an important role in addressing users' privacy concerns and informing them about the data collection, storage, and sharing practices. VA platforms (both Amazon Alexa and Google Assistant) allow third-party developers to build new voice-apps and publish them to the app store. Voice-app developers are required to provide privacy policies to disclose their apps' data practices. However, little is known whether these privacy policies are informative and trustworthy or not on emerging VA platforms. On the other hand, many users invoke voice-apps through voice and thus there exists a usability challenge for users to access these privacy policies. In this paper, we conduct the first large-scale data analytics to systematically measure the effectiveness of privacy policies provided by voice-app developers on two mainstream VA platforms. We seek to understand the quality and usability issues of privacy policies provided by developers in the current app stores. We analyzed 64,720 Amazon Alexa skills and 2,201 Google Assistant actions. Our work also includes a user study to understand users' perspectives on VA's privacy policies. Our findings reveal a worrisome reality of privacy policies in two mainstream voice-app stores, where there exists a substantial number of problematic privacy policies. Surprisingly, Google and Amazon even have official voice-apps violating their own requirements regarding the privacy policy.

CRMar 30, 2020
Deep Learning-Based Anomaly Detection in Cyber-Physical Systems: Progress and Opportunities

Yuan Luo, Ya Xiao, Long Cheng et al.

Anomaly detection is crucial to ensure the security of cyber-physical systems (CPS). However, due to the increasing complexity of CPSs and more sophisticated attacks, conventional anomaly detection methods, which face the growing volume of data and need domain-specific knowledge, cannot be directly applied to address these challenges. To this end, deep learning-based anomaly detection (DLAD) methods have been proposed. In this paper, we review state-of-the-art DLAD methods in CPSs. We propose a taxonomy in terms of the type of anomalies, strategies, implementation, and evaluation metrics to understand the essential properties of current methods. Further, we utilize this taxonomy to identify and highlight new characteristics and designs in each CPS domain. Also, we discuss the limitations and open problems of these methods. Moreover, to give users insights into choosing proper DLAD methods in practice, we experimentally explore the characteristics of typical neural models, the workflow of DLAD methods, and the running performance of DL models. Finally, we discuss the deficiencies of DL approaches, our findings, and possible directions to improve DLAD methods and motivate future research.

CRFeb 22, 2019
Exploitation Techniques and Defenses for Data-Oriented Attacks

Long Cheng, Hans Liljestrand, Thomas Nyman et al.

Data-oriented attacks manipulate non-control data to alter a program's benign behavior without violating its control-flow integrity. It has been shown that such attacks can cause significant damage even in the presence of control-flow defense mechanisms. However, these threats have not been adequately addressed. In this SoK paper, we first map data-oriented exploits, including Data-Oriented Programming (DOP) attacks, to their assumptions/requirements and attack capabilities. We also compare known defenses against these attacks, in terms of approach, detection capabilities, overhead, and compatibility. Then, we experimentally assess the feasibility of a detection approach that is based on the Intel Processor Trace (PT) technology. PT only traces control flows, thus, is generally believed to be not useful for data-oriented security. However, our work reveals that data-oriented attacks (in particular the recent DOP attacks) may generate side-effects on control-flow behavior in multiple dimensions, which manifest in PT traces. Based on this evaluation, we discuss challenges for building deployable data-oriented defenses and open research questions.

CRApr 30, 2018
Checking is Believing: Event-Aware Program Anomaly Detection in Cyber-Physical Systems

Long Cheng, Ke Tian, Danfeng Yao et al.

Securing cyber-physical systems (CPS) against malicious attacks is of paramount importance because these attacks may cause irreparable damages to physical systems. Recent studies have revealed that control programs running on CPS devices suffer from both control-oriented attacks (e.g., code-injection or code-reuse attacks) and data-oriented attacks (e.g., non-control data attacks). Unfortunately, existing detection mechanisms are insufficient to detect runtime data-oriented exploits, due to the lack of runtime execution semantics checking. In this work, we propose Orpheus, a new security methodology for defending against data-oriented attacks by enforcing cyber-physical execution semantics. We first present a general method for reasoning cyber-physical execution semantics of a control program (i.e., causal dependencies between the physical context and program control flows), including the event identification and dependence analysis. As an instantiation of Orpheus, we then present a new program behavior model, i.e., the event-aware finite-state automaton (eFSA). eFSA takes advantage of the event-driven nature of CPS control programs and incorporates event checking in anomaly detection. It detects data-oriented exploits if a specific physical event is missing along with the corresponding event dependent state transition. We evaluate our prototype's performance by conducting case studies under data-oriented attacks. Results show that eFSA can successfully detect different runtime attacks. Our prototype on Raspberry Pi incurs a low overhead, taking 0.0001s for each state transition integrity checking, and 0.063s~0.211s for the cyber-physical contextual consistency checking.