LGJan 27, 2024Code
Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor AttacksYige Li, Jiabo He, Hanxun Huang et al.
Backdoor attacks have become a significant threat to the pre-training and deployment of deep neural networks (DNNs). Although numerous methods for detecting and mitigating backdoor attacks have been proposed, most rely on identifying and eliminating the ``shortcut" created by the backdoor, which links a specific source class to a target class. However, these approaches can be easily circumvented by designing multiple backdoor triggers that create shortcuts everywhere and therefore nowhere specific. In this study, we explore the concept of Multi-Trigger Backdoor Attacks (MTBAs), where multiple adversaries leverage different types of triggers to poison the same dataset. By proposing and investigating three types of multi-trigger attacks including \textit{parallel}, \textit{sequential}, and \textit{hybrid} attacks, we demonstrate that 1) multiple triggers can coexist, overwrite, or cross-activate one another, and 2) MTBAs easily break the prevalent shortcut assumption underlying most existing backdoor detection/removal methods, rendering them ineffective. Given the security risk posed by MTBAs, we have created a multi-trigger backdoor poisoning dataset to facilitate future research on detecting and mitigating these attacks, and we also discuss potential defense strategies against MTBAs. Our code is available at https://github.com/bboylyg/Multi-Trigger-Backdoor-Attacks.
92.4CVMay 11
ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language ModelsTingshu Mou, Jiabo He, Renying Wang et al.
Recent advances in Multi-modal Large Language Models (MLLMs) target 3D spatial intelligence, yet the progress has been largely driven by post-training on curated benchmarks, leaving the inference-time approach relatively underexplored. In this paper, we take a training-free perspective and introduce ViSRA, a human-aligned Video-based Spatial Reasoning Agent, as a framework to probe the spatial reasoning mechanism of MLLMs. ViSRA elicits spatial reasoning in a modular and extensible manner by leveraging explicit spatial information from expert models, enabling a plug-and-play flexible paradigm. ViSRA offers two key advantages: (1) human-aligned and transferable 3D understanding rather than task-specific overfitting; and (2) no post-training computational cost along with heavy manual curation of spatial reasoning datasets. Experimental results demonstrate consistent improvement across a set of MLLMs on both existing benchmarks and unseen 3D spatial reasoning tasks, with ViSRA outperforming baselines by up to a 15.6% and 28.9% absolute margin respectively.
CVOct 28, 2021
SpineOne: A One-Stage Detection Framework for Degenerative Discs and VertebraeJiabo He, Wei Liu, Yu Wang et al.
Spinal degeneration plagues many elders, office workers, and even the younger generations. Effective pharmic or surgical interventions can help relieve degenerative spine conditions. However, the traditional diagnosis procedure is often too laborious. Clinical experts need to detect discs and vertebrae from spinal magnetic resonance imaging (MRI) or computed tomography (CT) images as a preliminary step to perform pathological diagnosis or preoperative evaluation. Machine learning systems have been developed to aid this procedure generally following a two-stage methodology: first perform anatomical localization, then pathological classification. Towards more efficient and accurate diagnosis, we propose a one-stage detection framework termed SpineOne to simultaneously localize and classify degenerative discs and vertebrae from MRI slices. SpineOne is built upon the following three key techniques: 1) a new design of the keypoint heatmap to facilitate simultaneous keypoint localization and classification; 2) the use of attention modules to better differentiate the representations between discs and vertebrae; and 3) a novel gradient-guided objective association mechanism to associate multiple learning objectives at the later training stage. Empirical results on the Spinal Disease Intelligent Diagnosis Tianchi Competition (SDID-TC) dataset of 550 exams demonstrate that our approach surpasses existing methods by a large margin.
CVOct 26, 2021
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box RegressionJiabo He, Sarah Erfani, Xingjun Ma et al.
Bounding box (bbox) regression is a fundamental task in computer vision. So far, the most commonly used loss functions for bbox regression are the Intersection over Union (IoU) loss and its variants. In this paper, we generalize existing IoU-based losses to a new family of power IoU losses that have a power IoU term and an additional power regularization term with a single power parameter $α$. We call this new family of losses the $α$-IoU losses and analyze properties such as order preservingness and loss/gradient reweighting. Experiments on multiple object detection benchmarks and models demonstrate that $α$-IoU losses, 1) can surpass existing IoU-based losses by a noticeable performance margin; 2) offer detectors more flexibility in achieving different levels of bbox regression accuracy by modulating $α$; and 3) are more robust to small datasets and noisy bboxes.
IVSep 23, 2020
Learning Non-Unique Segmentation with Reward-Penalty Dice LossJiabo He, Sarah Erfani, Sudanthi Wijewickrema et al.
Semantic segmentation is one of the key problems in the field of computer vision, as it enables computer image understanding. However, most research and applications of semantic segmentation focus on addressing unique segmentation problems, where there is only one gold standard segmentation result for every input image. This may not be true in some problems, e.g., medical applications. We may have non-unique segmentation annotations as different surgeons may perform successful surgeries for the same patient in slightly different ways. To comprehensively learn non-unique segmentation tasks, we propose the reward-penalty Dice loss (RPDL) function as the optimization objective for deep convolutional neural networks (DCNN). RPDL is capable of helping DCNN learn non-unique segmentation by enhancing common regions and penalizing outside ones. Experimental results show that RPDL improves the performance of DCNN models by up to 18.4% compared with other loss functions on our collected surgical dataset.