Shangce Gao

NE
h-index16
14papers
58citations
Novelty43%
AI Score52

14 Papers

CVMar 14Code
VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction

Hiroto Nakata, Yawen Zou, Shunsuke Sakai et al.

Logical anomaly detection in industrial inspection remains challenging due to variations in visual appearance (e.g., background clutter, illumination shift, and blur), which often distract vision-centric detectors from identifying rule-level violations. However, existing benchmarks rarely provide controlled settings where logical states are fixed while such nuisance factors vary. To address this gap, we introduce VID-AD, a dataset for logical anomaly detection under vision-induced distraction. It comprises 10 manufacturing scenarios and five capture conditions, totaling 50 one-class tasks and 10,395 images. Each scenario is defined by two logical constraints selected from quantity, length, type, placement, and relation, with anomalies including both single-constraint and combined violations. We further propose a language-based anomaly detection framework that relies solely on text descriptions generated from normal images. Using contrastive learning with positive texts and contradiction-based negative texts synthesized from these descriptions, our method learns embeddings that capture logical attributes rather than low-level features. Extensive experiments demonstrate consistent improvements over baselines across the evaluated settings. The dataset is available at: https://github.com/nkthiroto/VID-AD.

LGDec 28, 2022
Differentiable Search of Accurate and Robust Architectures

Yuwei Ou, Xiangning Xie, Shangce Gao et al.

Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting DNNs with poor accuracy and unsatisfactory robustness. To address this problem, we propose DSARA to automatically search for the neural architectures that are accurate and robust after adversarial training. In particular, we design a novel cell-based search space specially for adversarial training, which improves the accuracy and the robustness upper bound of the searched architectures by carefully designing the placement of the cells and the proportional relationship of the filter numbers. Then we propose a two-stage search strategy to search for both accurate and robust neural architectures. At the first stage, the architecture parameters are optimized to minimize the adversarial loss, which makes full use of the effectiveness of the adversarial training in enhancing the robustness. At the second stage, the architecture parameters are optimized to minimize both the natural loss and the adversarial loss utilizing the proposed multi-objective adversarial training method, so that the searched neural architectures are both accurate and robust. We evaluate the proposed algorithm under natural data and various adversarial attacks, which reveals the superiority of the proposed method in terms of both accurate and robust architectures. We also conclude that accurate and robust neural architectures tend to deploy very different structures near the input and the output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust neural architectures.

CVOct 20, 2023
Superpixel Semantics Representation and Pre-training for Vision-Language Task

Siyu Zhang, Yeming Chen, Yaoru Sun et al.

The key to integrating visual language tasks is to establish a good alignment strategy. Recently, visual semantic representation has achieved fine-grained visual understanding by dividing grids or image patches. However, the coarse-grained semantic interactions in image space should not be ignored, which hinders the extraction of complex contextual semantic relations at the scene boundaries. This paper proposes superpixels as comprehensive and robust visual primitives, which mine coarse-grained semantic interactions by clustering perceptually similar pixels, speeding up the subsequent processing of primitives. To capture superpixel-level semantic features, we propose a Multiscale Difference Graph Convolutional Network (MDGCN). It allows parsing the entire image as a fine-to-coarse visual hierarchy. To reason actual semantic relations, we reduce potential noise interference by aggregating difference information between adjacent graph nodes. Finally, we propose a multi-level fusion rule in a bottom-up manner to avoid understanding deviation by mining complementary spatial information at different levels. Experiments show that the proposed method can effectively promote the learning of multiple downstream tasks. Encouragingly, our method outperforms previous methods on all metrics. Our code will be released upon publication.

NEApr 4
RDEx-CMOP: Feasibility-Aware Indicator-Guided Differential Evolution for Fixed-Budget Constrained Multiobjective Optimization

Sichen Tao, Yifei Yang, Ruihan Zhao et al.

Constrained multiobjective optimisation requires fast feasibility attainment together with stable convergence and diversity preservation under strict evaluation budgets. This report documents RDEx-CMOP, the differential evolution variant used in the IEEE CEC 2025 numerical optimisation competition (C06 special session) constrained multiobjective track. RDEx-CMOP integrates an ε-level feasibility schedule, a SPEA2-style indicator-driven fitness assignment, and a fitness-oriented current-to-pbest/1 mutation operator. We evaluate RDEx-CMOP on the official CEC 2025 CMOP benchmark using the median-target U-score framework and the released trace data. Experimental results show that RDEx-CMOP achieves the highest total score and the best overall average rank among all released comparison algorithms, with strong target-attainment behaviour and near-zero final violation on most problems.

NEMar 28
RDEx-SOP: Exploitation-Biased Reconstructed Differential Evolution for Fixed-Budget Bound-Constrained Single-Objective Optimization

Sichen Tao, Yifei Yang, Ruihan Zhao et al.

Bound-constrained single-objective numerical optimisation remains a key benchmark for assessing the robustness and efficiency of evolutionary algorithms. This report documents RDEx-SOP, an exploitation-biased success-history differential evolution variant used in the IEEE CEC 2025 numerical optimisation competition (C06 special session). RDEx-SOP combines success-history parameter adaptation, an exploitation-biased hybrid branch, and lightweight local perturbations to balance fast convergence and final solution quality under a strict evaluation budget. We evaluate RDEx-SOP on the official CEC 2025 SOP benchmark with the U-score framework (Speed and Accuracy categories). Experimental results show that RDEx-SOP achieves strong overall performance and statistically competitive final outcomes across the 29 benchmark functions.

NEMar 28
RDEx-CSOP: Feasibility-Aware Reconstructed Differential Evolution with Adaptive epsilon-Constraint Ranking

Sichen Tao, Yifei Yang, Ruihan Zhao et al.

Constrained single-objective numerical optimisation requires both feasibility maintenance and strong objective-value convergence under limited evaluation budgets. This report documents RDEx-CSOP, a constrained differential evolution variant used in the IEEE CEC 2025 numerical optimisation competition (C06 special session). RDEx-CSOP combines success-history parameter adaptation with an exploitation-biased hybrid search and an ε-constraint handling mechanism with a time-varying threshold. We evaluate RDEx-CSOP on the official CEC 2025 CSOP benchmark using the U-score framework (Speed, Accuracy, and Constraint categories). The results show that RDEx-CSOP achieves the highest total score and the best average rank among all released comparison algorithms, mainly through strong speed and competitive constraint-handling performance across the 28 benchmark functions.

NEMar 28
RDEx-MOP: Indicator-Guided Reconstructed Differential Evolution for Fixed-Budget Multiobjective Optimization

Sichen Tao, Yifei Yang, Ruihan Zhao et al.

Multiobjective optimisation in the CEC 2025 MOP track is evaluated not only by final IGD values but also by how quickly an algorithm reaches the target region under a fixed evaluation budget. This report documents RDEx-MOP, the reconstructed differential evolution variant used in the IEEE CEC 2025 numerical optimisation competition (C06 special session) bound-constrained multiobjective track. RDEx-MOP integrates indicator-based environmental selection, a niche-maintained Pareto-candidate set, and complementary differential evolution operators for exploration and exploitation. We evaluate RDEx-MOP on the official CEC 2025 MOP benchmark using the released checkpoint traces and the median-target U-score framework. Experimental results show that RDEx-MOP achieves the highest total score and the best average rank among all released comparison algorithms, including the earlier RDEx baseline.

CLJun 27, 2021Code
A Cascade Dual-Decoder Model for Joint Entity and Relation Extraction

Jian Cheng, Tian Zhang, Shuang Zhang et al.

In knowledge graph construction, a challenging issue is how to extract complex (e.g., overlapping) entities and relationships from a small amount of unstructured historical data. The traditional pipeline methods are to divide the extraction into two separate subtasks, which misses the potential interaction between the two subtasks and may lead to error propagation. In this work, we propose an effective cascade dual-decoder method to extract overlapping relational triples, which includes a text-specific relation decoder and a relation-corresponded entity decoder. Our approach is straightforward and it includes a text-specific relation decoder and a relation-corresponded entity decoder. The text-specific relation decoder detects relations from a sentence at the text level. That is, it does this according to the semantic information of the whole sentence. For each extracted relation, which is with trainable embedding, the relation-corresponded entity decoder detects the corresponding head and tail entities using a span-based tagging scheme. In this way, the overlapping triple problem can be tackled naturally. We conducted experiments on a real-world open-pit mine dataset and two public datasets to verify the method's generalizability. The experimental results demonstrate the effectiveness and competitiveness of our proposed method and achieve better F1 scores under strict evaluation metrics. Our implementation is available at https://github.com/prastunlp/DualDec.

CVDec 15, 2025
3D Human-Human Interaction Anomaly Detection

Shun Maeda, Chunzhi Gu, Koichiro Kamide et al.

Human-centric anomaly detection (AD) has been primarily studied to specify anomalous behaviors in a single person. However, as humans by nature tend to act in a collaborative manner, behavioral anomalies can also arise from human-human interactions. Detecting such anomalies using existing single-person AD models is prone to low accuracy, as these approaches are typically not designed to capture the complex and asymmetric dynamics of interactions. In this paper, we introduce a novel task, Human-Human Interaction Anomaly Detection (H2IAD), which aims to identify anomalous interactive behaviors within collaborative 3D human actions. To address H2IAD, we then propose Interaction Anomaly Detection Network (IADNet), which is formalized with a Temporal Attention Sharing Module (TASM). Specifically, in designing TASM, we share the encoded motion embeddings across both people such that collaborative motion correlations can be effectively synchronized. Moreover, we notice that in addition to temporal dynamics, human interactions are also characterized by spatial configurations between two people. We thus introduce a Distance-Based Relational Encoding Module (DREM) to better reflect social cues in H2IAD. The normalizing flow is eventually employed for anomaly scoring. Extensive experiments on human-human motion benchmarks demonstrate that IADNet outperforms existing Human-centric AD baselines in H2IAD.

NEApr 25, 2024
An Efficient Reconstructed Differential Evolution Variant by Some of the Current State-of-the-art Strategies for Solving Single Objective Bound Constrained Problems

Sichen Tao, Ruihan Zhao, Kaiyu Wang et al.

Complex single-objective bounded problems are often difficult to solve. In evolutionary computation methods, since the proposal of differential evolution algorithm in 1997, it has been widely studied and developed due to its simplicity and efficiency. These developments include various adaptive strategies, operator improvements, and the introduction of other search methods. After 2014, research based on LSHADE has also been widely studied by researchers. However, although recently proposed improvement strategies have shown superiority over their previous generation's first performance, adding all new strategies may not necessarily bring the strongest performance. Therefore, we recombine some effective advances based on advanced differential evolution variants in recent years and finally determine an effective combination scheme to further promote the performance of differential evolution. In this paper, we propose a strategy recombination and reconstruction differential evolution algorithm called reconstructed differential evolution (RDE) to solve single-objective bounded optimization problems. Based on the benchmark suite of the 2024 IEEE Congress on Evolutionary Computation (CEC2024), we tested RDE and several other advanced differential evolution variants. The experimental results show that RDE has superior performance in solving complex optimization problems.

CVApr 26, 2024
Frequency-Guided Multi-Level Human Action Anomaly Detection with Normalizing Flows

Shun Maeda, Chunzhi Gu, Jun Yu et al.

We introduce the task of human action anomaly detection (HAAD), which aims to identify anomalous motions in an unsupervised manner given only the pre-determined normal category of training action samples. Compared to prior human-related anomaly detection tasks which primarily focus on unusual events from videos, HAAD involves the learning of specific action labels to recognize semantically anomalous human behaviors. To address this task, we propose a normalizing flow (NF)-based detection framework where the sample likelihood is effectively leveraged to indicate anomalies. As action anomalies often occur in some specific body parts, in addition to the full-body action feature learning, we incorporate extra encoding streams into our framework for a finer modeling of body subsets. Our framework is thus multi-level to jointly discover global and local motion anomalies. Furthermore, to show awareness of the potentially jittery data during recording, we resort to discrete cosine transformation by converting the action samples from the temporal to the frequency domain to mitigate the issue of data instability. Extensive experimental results on two human action datasets demonstrate that our method outperforms the baselines formed by adapting state-of-the-art human activity AD approaches to our task of HAAD.

ROMar 13
AOMGen: Photoreal, Physics-Consistent Demonstration Generation for Articulated Object Manipulation

Yulu Wu, Jiujun Cheng, Haowen Wang et al.

Recent advances in Vision-Language-Action (VLA) and world-model methods have improved generalization in tasks such as robotic manipulation and object interaction. However, Successful execution of such tasks depends on large, costly collections of real demonstrations, especially for fine-grained manipulation of articulated objects. To address this, we present AOMGen, a scalable data generation framework for articulated manipulation which is instantiated from a single real scan, demonstration and a library of readily available digital assets, yielding photoreal training data with verified physical states. The framework synthesizes synchronized multi-view RGB temporally aligned with action commands and state annotations for joints and contacts, and systematically varies camera viewpoints, object styles, and object poses to expand a single execution into a diverse corpus. Experimental results demonstrate that fine-tuning VLA policies on AOMGen data increases the success rate from 0% to 88.7%, and the policies are tested on unseen objects and layouts.

LGAug 5, 2025
Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision Quantization

Haidong Kang, Lianbo Ma, Guo Yu et al.

Mixed precision quantization (MPQ) is an effective quantization approach to achieve accuracy-complexity trade-off of neural network, through assigning different bit-widths to network activations and weights in each layer. The typical way of existing MPQ methods is to optimize quantization policies (i.e., bit-width allocation) in a gradient descent manner, termed as Differentiable (DMPQ). At the end of the search, the bit-width associated to the quantization parameters which has the largest value will be selected to form the final mixed precision quantization policy, with the implicit assumption that the values of quantization parameters reflect the operation contribution to the accuracy improvement. While much has been discussed about the MPQ improvement, the bit-width selection process has received little attention. We study this problem and argue that the magnitude of quantization parameters does not necessarily reflect the actual contribution of the bit-width to the task performance. Then, we propose a Shapley-based MPQ (SMPQ) method, which measures the bit-width operation direct contribution on the MPQ task. To reduce computation cost, a Monte Carlo sampling-based approximation strategy is proposed for Shapley computation. Extensive experiments on mainstream benchmarks demonstrate that our SMPQ consistently achieves state-of-the-art performance than gradient-based competitors.

NEJan 27, 2021
ASBSO: An Improved Brain Storm Optimization With Flexible Search Length and Memory-Based Selection

Yang Yu, Shangce Gao, Yirui Wang et al.

Brain storm optimization (BSO) is a newly proposed population-based optimization algorithm, which uses a logarithmic sigmoid transfer function to adjust its search range during the convergent process. However, this adjustment only varies with the current iteration number and lacks of flexibility and variety which makes a poor search effciency and robustness of BSO. To alleviate this problem, an adaptive step length structure together with a success memory selection strategy is proposed to be incorporated into BSO. This proposed method, adaptive step length based on memory selection BSO, namely ASBSO, applies multiple step lengths to modify the generation process of new solutions, thus supplying a flexible search according to corresponding problems and convergent periods. The novel memory mechanism, which is capable of evaluating and storing the degree of improvements of solutions, is used to determine the selection possibility of step lengths. A set of 57 benchmark functions are used to test ASBSO's search ability, and four real-world problems are adopted to show its application value. All these test results indicate the remarkable improvement in solution quality, scalability, and robustness of ASBSO.