CRAug 18, 2023
DFB: A Data-Free, Low-Budget, and High-Efficacy Clean-Label Backdoor AttackBinhao Ma, Jiahui Wang, Dejun Wang et al.
In the domain of backdoor attacks, accurate labeling of injected data is essential for evading rudimentary detection mechanisms. This imperative has catalyzed the development of clean-label attacks, which are notably more elusive as they preserve the original labels of the injected data. Current clean-label attack methodologies primarily depend on extensive knowledge of the training dataset. However, practically, such comprehensive dataset access is often unattainable, given that training datasets are typically compiled from various independent sources. Departing from conventional clean-label attack methodologies, our research introduces DFB, a data-free, low-budget, and high-efficacy clean-label backdoor Attack. DFB is unique in its independence from training data access, requiring solely the knowledge of a specific target class. Tested on CIFAR10, Tiny-ImageNet, and TSRD, DFB demonstrates remarkable efficacy with minimal poisoning rates of just 0.1%, 0.025%, and 0.4%, respectively. These rates are significantly lower than those required by existing methods such as LC, HTBA, BadNets, and Blend, yet DFB achieves superior attack success rates. Furthermore, our findings reveal that DFB poses a formidable challenge to four established backdoor defense algorithms, indicating its potential as a robust tool in advanced clean-label attack strategies.
AIAug 23, 2024
QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment RetrievalChenghua Gao, Min Li, Jianshuo Liu et al.
Video Moment Retrieval (VMR) aims to retrieve relevant moments of an untrimmed video corresponding to the query. While cross-modal interaction approaches have shown progress in filtering out query-irrelevant information in videos, they assume the precise alignment between the query semantics and the corresponding video moments, potentially overlooking the misunderstanding of the natural language semantics. To address this challenge, we propose a novel model called \textit{QD-VMR}, a query debiasing model with enhanced contextual understanding. Firstly, we leverage a Global Partial Aligner module via video clip and query features alignment and video-query contrastive learning to enhance the cross-modal understanding capabilities of the model. Subsequently, we employ a Query Debiasing Module to obtain debiased query features efficiently, and a Visual Enhancement module to refine the video features related to the query. Finally, we adopt the DETR structure to predict the possible target video moments. Through extensive evaluations of three benchmark datasets, QD-VMR achieves state-of-the-art performance, proving its potential to improve the accuracy of VMR. Further analytical experiments demonstrate the effectiveness of our proposed module. Our code will be released to facilitate future research.
AIMay 13, 2025
Deep Reinforcement Learning for Power Grid Multi-Stage Cascading Failure MitigationBo Meng, Chenghao Xu, Yongli Zhu
Cascading failures in power grids can lead to grid collapse, causing severe disruptions to social operations and economic activities. In certain cases, multi-stage cascading failures can occur. However, existing cascading-failure-mitigation strategies are usually single-stage-based, overlooking the complexity of the multi-stage scenario. This paper treats the multi-stage cascading failure problem as a reinforcement learning task and develops a simulation environment. The reinforcement learning agent is then trained via the deterministic policy gradient algorithm to achieve continuous actions. Finally, the effectiveness of the proposed approach is validated on the IEEE 14-bus and IEEE 118-bus systems.
CVSep 7, 2021
Hierarchical Graph Convolutional Skeleton Transformer for Action RecognitionRuwen Bai, Min Li, Bo Meng et al.
Graph convolutional networks (GCNs) have emerged as dominant methods for skeleton-based action recognition. However, they still suffer from two problems, namely, neighborhood constraints and entangled spatiotemporal feature representations. Most studies have focused on improving the design of graph topology to solve the first problem but they have yet to fully explore the latter. In this work, we design a disentangled spatiotemporal transformer (DSTT) block to overcome the above limitations of GCNs in three steps: (i) feature disentanglement for spatiotemporal decomposition;(ii) global spatiotemporal attention for capturing correlations in the global context; and (iii) local information enhancement for utilizing more local information. Thereon, we propose a novel architecture, named Hierarchical Graph Convolutional skeleton Transformer (HGCT), to employ the complementary advantages of GCN (i.e., local topology, temporal dynamics and hierarchy) and Transformer (i.e., global context and dynamic attention). HGCT is lightweight and computationally efficient. Quantitative analysis demonstrates the superiority and good interpretability of HGCT.
CVAug 27, 2021
Rethinking the Misalignment Problem in Dense Object DetectionYang Yang, Min Li, Bo Meng et al.
Object detection aims to localize and classify the objects in a given image, and these two tasks are sensitive to different object regions. Therefore, some locations predict high-quality bounding boxes but low classification scores, and some locations are quite the opposite. A misalignment exists between the two tasks, and their features are spatially entangled. In order to solve the misalignment problem, we propose a plug-in Spatial-disentangled and Task-aligned operator (SALT). By predicting two task-aware point sets that are located in each task's sensitive regions, SALT can reassign features from those regions and align them to the corresponding anchor point. Therefore, features for the two tasks are spatially aligned and disentangled. To minimize the difference between the two regression stages, we propose a Self-distillation regression (SDR) loss that can transfer knowledge from the refined regression results to the coarse regression results. On the basis of SALT and SDR loss, we propose SALT-Net, which explicitly exploits task-aligned point-set features for accurate detection results. Extensive experiments on the MS-COCO dataset show that our proposed methods can consistently boost different state-of-the-art dense detectors by $\sim$2 AP. Notably, SALT-Net with Res2Net-101-DCN backbone achieves 53.8 AP on the MS-COCO test-dev.
CVApr 29, 2021
Objects as Extreme PointsYang Yang, Min Li, Bo Meng et al.
Object detection can be regarded as a pixel clustering task, and its boundary is determined by four extreme points (leftmost, top, rightmost, and bottom). However, most studies focus on the center or corner points of the object, which are actually conditional results of the extreme points. In this paper, we present an Extreme-Point-Prediction- Based object detector (EPP-Net), which directly regresses the relative displacement vector between each pixel and the four extreme points. We also propose a new metric to measure the similarity between two groups of extreme points, namely, Extreme Intersection over Union (EIoU), and incorporate this EIoU as a new regression loss. Moreover, we propose a novel branch to predict the EIoU between the ground-truth and the prediction results, and take it as the localization confidence to filter out poor detection results. On the MS-COCO dataset, our method achieves an average precision (AP) of 44.0% with ResNet-50 and an AP of 50.3% with ResNeXt-101-DCN. The proposed EPP-Net provides a new method to detect objects and outperforms state-of-the-art anchor-free detectors.