CVFeb 13, 2023
Threatening Patch Attacks on Object Detection in Optical Remote Sensing ImagesXuxiang Sun, Gong Cheng, Lei Pei et al.
Advanced Patch Attacks (PAs) on object detection in natural images have pointed out the great safety vulnerability in methods based on deep neural networks. However, little attention has been paid to this topic in Optical Remote Sensing Images (O-RSIs). To this end, we focus on this research, i.e., PAs on object detection in O-RSIs, and propose a more Threatening PA without the scarification of the visual quality, dubbed TPA. Specifically, to address the problem of inconsistency between local and global landscapes in existing patch selection schemes, we propose leveraging the First-Order Difference (FOD) of the objective function before and after masking to select the sub-patches to be attacked. Further, considering the problem of gradient inundation when applying existing coordinate-based loss to PAs directly, we design an IoU-based objective function specific for PAs, dubbed Bounding box Drifting Loss (BDL), which pushes the detected bounding boxes far from the initial ones until there are no intersections between them. Finally, on two widely used benchmarks, i.e., DIOR and DOTA, comprehensive evaluations of our TPA with four typical detectors (Faster R-CNN, FCOS, RetinaNet, and YOLO-v4) witness its remarkable effectiveness. To the best of our knowledge, this is the first attempt to study the PAs on object detection in O-RSIs, and we hope this work can get our readers interested in studying this topic.
CVMar 12Code
Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH DetectionLiang Tang, Hongda Li, Jiayu Zhang et al.
Emotion recognition in videos is a pivotal task in affective computing, where identifying subtle psychological states such as Ambivalence and Hesitancy holds significant value for behavioral intervention and digital health. Ambivalence and Hesitancy states often manifest through cross-modal inconsistencies such as discrepancies between facial expressions, vocal tones, and textual semantics, posing a substantial challenge for automated recognition. This paper proposes a recognition framework that integrates temporal segment modeling with Multimodal Large Language Models. To address computational efficiency and token constraints in long video processing, we employ a segment-based strategy, partitioning videos into short clips with a maximum duration of 5 seconds. We leverage the Qwen3-Omni-30B-A3B model, fine-tuned on the BAH dataset using LoRA and full-parameter strategies via the MS-Swift framework, enabling the model to synergistically analyze visual and auditory signals. Experimental results demonstrate that the proposed method achieves an accuracy of 85.1% on the test set, significantly outperforming existing benchmarks and validating the superior capability of Multimodal Large Language Models in capturing complex and nuanced emotional conflicts. The code is released at https://github.com/dlnn123/A-H-Detection-with-Qwen-Omni.git.
CRMay 2
Phishing Detection in Ethereum via Temporal Graph Contrastive LearningCong Wu, Jing Chen, Siqi Lin et al.
Blockchain and decentralized finance have revolutionized the financial ecosystem while simultaneously exposing it to cryptocurrency phishing attacks. Existing phishing detection methods primarily rely on graph learning, but they face significant limitations. Static graph learning approaches fail to account for the temporal evolution of phishing patterns, while semi-dynamic methods, such as those combining static GNNs with LSTM, struggle to capture the irregular and bursty nature of blockchain transactions. Moreover, these methods overlook the diversity of Ethereum transactions, treating them as homogeneous graphs, and heavily rely on supervised learning, which requires extensive labeled data that is not readily available. These limitations reduce their adaptability to emerging phishing threats. In this paper, we present PhishEye, a fully dynamic self-supervised system that monitors on-chain transactions to detect phishing activities. PhishEye formulates Ethereum transactions as a heterogeneous temporal attributed multi-graph and incorporates a novel temporal graph contrastive learning model, which captures both temporal patterns and heterogeneous transaction types. The evaluation on a dataset of 161,658 addresses and 416,541 transactions shows that PhishEye outperforms existing methods, achieving an F1 score of 87.23% and an AUC of 98.43% for phishing transaction detection, and an F1 score of 94.19% and an AUC of 98.03% for phishing account detection. In real-world deployment from May 1, 2023 to July 31, 2024, PhishEye identified 1,803 previously unknown phishing addresses, providing early alerts that helped prevent losses exceeding 2 billion USD.
LGDec 29, 2024
Safe Bayesian Optimization for the Control of High-Dimensional Embodied SystemsYunyue Wei, Zeji Yi, Hongda Li et al.
Learning to move is a primary goal for animals and robots, where ensuring safety is often important when optimizing control policies on the embodied systems. For complex tasks such as the control of human or humanoid control, the high-dimensional parameter space adds complexity to the safe optimization effort. Current safe exploration algorithms exhibit inefficiency and may even become infeasible with large high-dimensional input spaces. Furthermore, existing high-dimensional constrained optimization methods neglect safety in the search process. In this paper, we propose High-dimensional Safe Bayesian Optimization with local optimistic exploration (HdSafeBO), a novel approach designed to handle high-dimensional sampling problems under probabilistic safety constraints. We introduce a local optimistic strategy to efficiently and safely optimize the objective function, providing a probabilistic safety guarantee and a cumulative safety violation bound. Through the use of isometric embedding, HdSafeBO addresses problems ranging from a few hundred to several thousand dimensions while maintaining safety guarantees. To our knowledge, HdSafeBO is the first algorithm capable of optimizing the control of high-dimensional musculoskeletal systems with high safety probability. We also demonstrate the real-world applicability of HdSafeBO through its use in the safe online optimization of neural stimulation induced human motion control.
CVSep 27, 2018
Vision-based Navigation of Autonomous Vehicle in Roadway Environments with Unexpected HazardsMhafuzul Islam, Mahsrur Chowdhury, Hongda Li et al.
Vision-based navigation of autonomous vehicles primarily depends on the Deep Neural Network (DNN) based systems in which the controller obtains input from sensors/detectors, such as cameras and produces a vehicle control output, such as a steering wheel angle to navigate the vehicle safely in a roadway traffic environment. Typically, these DNN-based systems of the autonomous vehicle are trained through supervised learning; however, recent studies show that a trained DNN-based system can be compromised by perturbation or adversarial inputs. Similarly, this perturbation can be introduced into the DNN-based systems of autonomous vehicle by unexpected roadway hazards, such as debris and roadblocks. In this study, we first introduce a roadway hazardous environment (both intentional and unintentional roadway hazards) that can compromise the DNN-based navigational system of an autonomous vehicle, and produces an incorrect steering wheel angle, which can cause crashes resulting in fatality and injury. Then, we develop a DNN-based autonomous vehicle driving system using object detection and semantic segmentation to mitigate the adverse effect of this type of hazardous environment, which helps the autonomous vehicle to navigate safely around such hazards. We find that our developed DNN-based autonomous vehicle driving system including hazardous object detection and semantic segmentation improves the navigational ability of an autonomous vehicle to avoid a potential hazard by 21% compared to the traditional DNN-based autonomous vehicle driving system.
NINov 29, 2017
Cybersecurity Attacks in Vehicle-to-Infrastructure (V2I) Applications and their PreventionMhafuzul Islam, Mashrur Chowdhury, Hongda Li et al.
A connected vehicle (CV) environment is composed of a diverse data collection, data communication and dissemination, and computing infrastructure systems that are vulnerable to the same cyberattacks as all traditional computing environments. Cyberattacks can jeopardize the expected safety, mobility, energy, and environmental benefits from connected vehicle applications. As cyberattacks can lead to severe traffic incidents, it has become one of the primary concerns in connected vehicle applications. In this paper, we investigate the impact of cyberattacks on the vehicle-to-infrastructure (V2I) network from a V2I application point of view. Then, we develop a novel V2I cybersecurity architecture, named CVGuard, which can detect and prevent cyberattacks on the V2I environment. In designing CVGuard, key challenges, such as scalability, resiliency and future usability were considered. A case study using a distributed denial of service (DDoS) on a V2I application, i.e., the Stop Sign Gap Assist (SSGA) application, shows that CVGuard was effective in mitigating the adverse effects created by a DDoS attack. In our case study, because of the DDoS attack, conflicts between the minor and major road vehicles occurred in an unsignalized intersection, which could have caused potential crashes. A reduction of conflicts between vehicles occurred because CVGuard was in operation. The reduction of conflicts was compared based on the number of conflicts before and after the implementation and operation of the CVGuard security platform. Analysis revealed that the strategies adopted by the CVGuard were successful in reducing the inter-vehicle conflicts by 60% where a DDoS attack compromised the SSGA application at an unsignalized intersection.