Wen Yao

LG
h-index18
66papers
1,378citations
Novelty51%
AI Score59

66 Papers

LGSep 1, 2022Code
Physics-informed MTA-UNet: Prediction of Thermal Stress and Thermal Deformation of Satellites

Zeyu Cao, Wen Yao, Wei Peng et al.

The rapid analysis of thermal stress and deformation plays a pivotal role in the thermal control measures and optimization of the structural design of satellites. For achieving real-time thermal stress and thermal deformation analysis of satellite motherboards, this paper proposes a novel Multi-Task Attention UNet (MTA-UNet) neural network which combines the advantages of both Multi-Task Learning (MTL) and U-Net with attention mechanism. Besides, a physics-informed strategy is used in the training process, where partial differential equations (PDEs) are integrated into the loss functions as residual terms. Finally, an uncertainty-based loss balancing approach is applied to weight different loss functions of multiple training tasks. Experimental results show that the proposed MTA-UNet effectively improves the prediction accuracy of multiple physics tasks compared with Single-Task Learning (STL) models. In addition, the physics-informed method brings less error in the prediction of each task, especially on small data sets. The code can be downloaded at: \url{https://github.com/KomorebiTso/MTA-UNet}.

CVJul 14, 2023
RFLA: A Stealthy Reflected Light Adversarial Attack in the Physical World

Donghua Wang, Wen Yao, Tingsong Jiang et al.

Physical adversarial attacks against deep neural networks (DNNs) have recently gained increasing attention. The current mainstream physical attacks use printed adversarial patches or camouflage to alter the appearance of the target object. However, these approaches generate conspicuous adversarial patterns that show poor stealthiness. Another physical deployable attack is the optical attack, featuring stealthiness while exhibiting weakly in the daytime with sunlight. In this paper, we propose a novel Reflected Light Attack (RFLA), featuring effective and stealthy in both the digital and physical world, which is implemented by placing the color transparent plastic sheet and a paper cut of a specific shape in front of the mirror to create different colored geometries on the target object. To achieve these goals, we devise a general framework based on the circle to model the reflected light on the target object. Specifically, we optimize a circle (composed of a coordinate and radius) to carry various geometrical shapes determined by the optimized angle. The fill color of the geometry shape and its corresponding transparency are also optimized. We extensively evaluate the effectiveness of RFLA on different datasets and models. Experiment results suggest that the proposed method achieves over 99% success rate on different datasets and models in the digital world. Additionally, we verify the effectiveness of the proposed method in different physical environments by using sunlight or a flashlight.

LGMay 14, 2022
Bayesian Physics-Informed Extreme Learning Machine for Forward and Inverse PDE Problems with Noisy Data

Xu Liu, Wen Yao, Wei Peng et al.

Physics-informed extreme learning machine (PIELM) has recently received significant attention as a rapid version of physics-informed neural network (PINN) for solving partial differential equations (PDEs). The key characteristic is to fix the input layer weights with random values and use Moore-Penrose generalized inverse for the output layer weights. The framework is effective, but it easily suffers from overfitting noisy data and lacks uncertainty quantification for the solution under noise scenarios.To this end, we develop the Bayesian physics-informed extreme learning machine (BPIELM) to solve both forward and inverse linear PDE problems with noisy data in a unified framework. In our framework, a prior probability distribution is introduced in the output layer for extreme learning machine with physic laws and the Bayesian method is used to estimate the posterior of parameters. Besides, for inverse PDE problems, problem parameters considered as new output layer weights are unified in a framework with forward PDE problems. Finally, we demonstrate BPIELM considering both forward problems, including Poisson, advection, and diffusion equations, as well as inverse problems, where unknown problem parameters are estimated. The results show that, compared with PIELM, BPIELM quantifies uncertainty arising from noisy data and provides more accurate predictions. In addition, BPIELM is considerably cheaper than PINN in terms of the computational cost.

CVNov 25, 2023Code
HyperDID: Hyperspectral Intrinsic Image Decomposition with Deep Feature Embedding

Zhiqiang Gong, Xian Zhou, Wen Yao et al.

The dissection of hyperspectral images into intrinsic components through hyperspectral intrinsic image decomposition (HIID) enhances the interpretability of hyperspectral data, providing a foundation for more accurate classification outcomes. However, the classification performance of HIID is constrained by the model's representational ability. To address this limitation, this study rethinks hyperspectral intrinsic image decomposition for classification tasks by introducing deep feature embedding. The proposed framework, HyperDID, incorporates the Environmental Feature Module (EFM) and Categorical Feature Module (CFM) to extract intrinsic features. Additionally, a Feature Discrimination Module (FDM) is introduced to separate environment-related and category-related features. Experimental results across three commonly used datasets validate the effectiveness of HyperDID in improving hyperspectral image classification performance. This novel approach holds promise for advancing the capabilities of hyperspectral image analysis by leveraging deep feature embedding principles. The implementation of the proposed method could be accessed soon at https://github.com/shendu-sw/HyperDID for the sake of reproducibility.

CVMar 8, 2022Code
Contrastive Enhancement Using Latent Prototype for Few-Shot Segmentation

Xiaoyu Zhao, Xiaoqian Chen, Zhiqiang Gong et al.

Few-shot segmentation enables the model to recognize unseen classes with few annotated examples. Most existing methods adopt prototype learning architecture, where support prototype vectors are expanded and concatenated with query features to perform conditional segmentation. However, such framework potentially focuses more on query features while may neglect the similarity between support and query features. This paper proposes a contrastive enhancement approach using latent prototypes to leverage latent classes and raise the utilization of similarity information between prototype and query features. Specifically, a latent prototype sampling module is proposed to generate pseudo-mask and novel prototypes based on features similarity. The module conveniently conducts end-to-end learning and has no strong dependence on clustering numbers like cluster-based method. Besides, a contrastive enhancement module is developed to drive models to provide different predictions with the same query features. Our method can be used as an auxiliary module to flexibly integrate into other baselines for a better segmentation performance. Extensive experiments show our approach remarkably improves the performance of state-of-the-art methods for 1-shot and 5-shot segmentation, especially outperforming baseline by 5.9% and 7.3% for 5-shot task on Pascal-5^i and COCO-20^i. Source code is available at https://github.com/zhaoxiaoyu1995/CELP-Pytorch

LGMay 2, 2022
RANG: A Residual-based Adaptive Node Generation Method for Physics-Informed Neural Networks

Wei Peng, Weien Zhou, Xiaoya Zhang et al.

Learning solutions of partial differential equations (PDEs) with Physics-Informed Neural Networks (PINNs) is an attractive alternative approach to traditional solvers due to its flexibility and ease of incorporating observed data. Despite the success of PINNs in accurately solving a wide variety of PDEs, the method still requires improvements in terms of computational efficiency. One possible improvement idea is to optimize the generation of training point sets. Residual-based adaptive sampling and quasi-uniform sampling approaches have been each applied to improve the training effects of PINNs, respectively. To benefit from both methods, we propose the Residual-based Adaptive Node Generation (RANG) approach for efficient training of PINNs, which is based on a variable density nodal distribution method for RBF-FD. The method is also enhanced by a memory mechanism to further improve training stability. We conduct experiments on three linear PDEs and three nonlinear PDEs with various node generation methods, through which the accuracy and efficiency of the proposed method compared to the predominant uniform sampling approach is verified numerically.

CVOct 28, 2023Code
Deep Intrinsic Decomposition with Adversarial Learning for Hyperspectral Image Classification

Zhiqiang Gong, Xian Zhou, Wen Yao

Convolutional neural networks (CNNs) have been demonstrated their powerful ability to extract discriminative features for hyperspectral image classification. However, general deep learning methods for CNNs ignore the influence of complex environmental factor which enlarges the intra-class variance and decreases the inter-class variance. This multiplies the difficulty to extract discriminative features. To overcome this problem, this work develops a novel deep intrinsic decomposition with adversarial learning, namely AdverDecom, for hyperspectral image classification to mitigate the negative impact of environmental factors on classification performance. First, we develop a generative network for hyperspectral image (HyperNet) to extract the environmental-related feature and category-related feature from the image. Then, a discriminative network is constructed to distinguish different environmental categories. Finally, a environmental and category joint learning loss is developed for adversarial learning to make the deep model learn discriminative features. Experiments are conducted over three commonly used real-world datasets and the comparison results show the superiority of the proposed method. The implementation of the proposed method and other compared methods could be accessed at https://github.com/shendu-sw/Adversarial Learning Intrinsic Decomposition for the sake of reproducibility.

LGMar 15, 2022
A physics and data co-driven surrogate modeling approach for temperature field prediction on irregular geometric domain

Kairui Bao, Wen Yao, Xiaoya Zhang et al.

In the whole aircraft structural optimization loop, thermal analysis plays a very important role. But it faces a severe computational burden when directly applying traditional numerical analysis tools, especially when each optimization involves repetitive parameter modification and thermal analysis followed. Recently, with the fast development of deep learning, several Convolutional Neural Network (CNN) surrogate models have been introduced to overcome this obstacle. However, for temperature field prediction on irregular geometric domains (TFP-IGD), CNN can hardly be competent since most of them stem from processing for regular images. To alleviate this difficulty, we propose a novel physics and data co-driven surrogate modeling method. First, after adapting the Bezier curve in geometric parameterization, a body-fitted coordinate mapping is introduced to generate coordinate transforms between the irregular physical plane and regular computational plane. Second, a physics-driven CNN surrogate with partial differential equation (PDE) residuals as a loss function is utilized for fast meshing (meshing surrogate); then, we present a data-driven surrogate model based on the multi-level reduced-order method, aiming to learn solutions of temperature field in the above regular computational plane (thermal surrogate). Finally, combining the grid position information provided by the meshing surrogate with the scalar temperature field information provided by the thermal surrogate (combined model), we reach an end-to-end surrogate model from geometric parameters to temperature field prediction on an irregular geometric domain. Numerical results demonstrate that our method can significantly improve accuracy prediction on a smaller dataset while reducing the training time when compared with other CNN methods.

LGOct 19, 2022
Robust Regression with Highly Corrupted Data via Physics Informed Neural Networks

Wei Peng, Wen Yao, Weien Zhou et al.

Physics-informed neural networks (PINNs) have been proposed to solve two main classes of problems: data-driven solutions and data-driven discovery of partial differential equations. This task becomes prohibitive when such data is highly corrupted due to the possible sensor mechanism failing. We propose the Least Absolute Deviation based PINN (LAD-PINN) to reconstruct the solution and recover unknown parameters in PDEs - even if spurious data or outliers corrupt a large percentage of the observations. To further improve the accuracy of recovering hidden physics, the two-stage Median Absolute Deviation based PINN (MAD-PINN) is proposed, where LAD-PINN is employed as an outlier detector followed by MAD screening out the highly corrupted data. Then the vanilla PINN or its variants can be subsequently applied to exploit the remaining normal data. Through several examples, including Poisson's equation, wave equation, and steady or unsteady Navier-Stokes equations, we illustrate the generalizability, accuracy and efficiency of the proposed algorithms for recovering governing equations from noisy and highly corrupted measurement data.

CVSep 28, 2022
A Survey on Physical Adversarial Attack in Computer Vision

Donghua Wang, Wen Yao, Tingsong Jiang et al.

Over the past decade, deep learning has revolutionized conventional tasks that rely on hand-craft feature extraction with its strong feature learning capability, leading to substantial enhancements in traditional tasks. However, deep neural networks (DNNs) have been demonstrated to be vulnerable to adversarial examples crafted by malicious tiny noise, which is imperceptible to human observers but can make DNNs output the wrong result. Existing adversarial attacks can be categorized into digital and physical adversarial attacks. The former is designed to pursue strong attack performance in lab environments while hardly remaining effective when applied to the physical world. In contrast, the latter focus on developing physical deployable attacks, thus exhibiting more robustness in complex physical environmental conditions. Recently, with the increasing deployment of the DNN-based system in the real world, strengthening the robustness of these systems is an emergency, while exploring physical adversarial attacks exhaustively is the precondition. To this end, this paper reviews the evolution of physical adversarial attacks against DNN-based computer vision tasks, expecting to provide beneficial information for developing stronger physical adversarial attacks. Specifically, we first proposed a taxonomy to categorize the current physical adversarial attacks and grouped them. Then, we discuss the existing physical attacks and focus on the technique for improving the robustness of physical attacks under complex physical environmental conditions. Finally, we discuss the issues of the current physical adversarial attacks to be solved and give promising directions.

AIFeb 20, 2023
RecFNO: a resolution-invariant flow and heat field reconstruction method from sparse observations via Fourier neural operator

Xiaoyu Zhao, Xiaoqian Chen, Zhiqiang Gong et al.

Perception of the full state is an essential technology to support the monitoring, analysis, and design of physical systems, one of whose challenges is to recover global field from sparse observations. Well-known for brilliant approximation ability, deep neural networks have been attractive to data-driven flow and heat field reconstruction studies. However, limited by network structure, existing researches mostly learn the reconstruction mapping in finite-dimensional space and has poor transferability to variable resolution of outputs. In this paper, we extend the new paradigm of neural operator and propose an end-to-end physical field reconstruction method with both excellent performance and mesh transferability named RecFNO. The proposed method aims to learn the mapping from sparse observations to flow and heat field in infinite-dimensional space, contributing to a more powerful nonlinear fitting capacity and resolution-invariant characteristic. Firstly, according to different usage scenarios, we develop three types of embeddings to model the sparse observation inputs: MLP, mask, and Voronoi embedding. The MLP embedding is propitious to more sparse input, while the others benefit from spatial information preservation and perform better with the increase of observation data. Then, we adopt stacked Fourier layers to reconstruct physical field in Fourier space that regularizes the overall recovered field by Fourier modes superposition. Benefiting from the operator in infinite-dimensional space, the proposed method obtains remarkable accuracy and better resolution transferability among meshes. The experiments conducted on fluid mechanics and thermology problems show that the proposed method outperforms existing POD-based and CNN-based methods in most cases and has the capacity to achieve zero-shot super-resolution.

SYJun 15, 2022
Reliability Analysis of Complex Multi-State System Based on Universal Generating Function and Bayesian Network

Xu Liu, Wen Yao, Xiaohu Zheng et al.

In the complex multi-state system (MSS), reliability analysis is a significant research content, both for equipment design, manufacturing, usage and maintenance. Universal Generating Function (UGF) is an important method in the reliability analysis, which efficiently obtains the system reliability by a fast algebraic procedure. However, when structural relationships between subsystems or components are not clear or without explicit expressions, the UGF method is difficult to use or not applicable at all. Bayesian Network (BN) has a natural advantage in terms of uncertainty inference for the relationship without explicit expressions. For the number of components is extremely large, though, it has the defects of low efficiency. To overcome the respective defects of UGF and BN, a novel reliability analysis method called UGF-BN is proposed for the complex MSS. In the UGF-BN framework, the UGF method is firstly used to analyze the bottom components with a large number. Then probability distributions obtained are taken as the input of BN. Finally, the reliability of the complex MSS is modeled by the BN method. This proposed method improves the computational efficiency, especially for the MSS with the large number of bottom components. Besides, the aircraft reliability-based design optimization based on the UGF-BN method is further studied with budget constraints on mass, power, and cost. Finally, two cases are used to demonstrate and verify the proposed method.

CVOct 17, 2022
Differential Evolution based Dual Adversarial Camouflage: Fooling Human Eyes and Object Detectors

Jialiang Sun, Tingsong Jiang, Wen Yao et al.

Recent studies reveal that deep neural network (DNN) based object detectors are vulnerable to adversarial attacks in the form of adding the perturbation to the images, leading to the wrong output of object detectors. Most current existing works focus on generating perturbed images, also called adversarial examples, to fool object detectors. Though the generated adversarial examples themselves can remain a certain naturalness, most of them can still be easily observed by human eyes, which limits their further application in the real world. To alleviate this problem, we propose a differential evolution based dual adversarial camouflage (DE_DAC) method, composed of two stages to fool human eyes and object detectors simultaneously. Specifically, we try to obtain the camouflage texture, which can be rendered over the surface of the object. In the first stage, we optimize the global texture to minimize the discrepancy between the rendered object and the scene images, making human eyes difficult to distinguish. In the second stage, we design three loss functions to optimize the local texture, making object detectors ineffective. In addition, we introduce the differential evolution algorithm to search for the near-optimal areas of the object to attack, improving the adversarial performance under certain attack area limitations. Besides, we also study the performance of adaptive DE_DAC, which can be adapted to the environment. Experiments show that our proposed method could obtain a good trade-off between the fooling human eyes and object detectors under multiple specific scenes and objects.

FLU-DYNApr 14, 2023
Multi-fidelity prediction of fluid flow and temperature field based on transfer learning using Fourier Neural Operator

Yanfang Lyu, Xiaoyu Zhao, Zhiqiang Gong et al.

Data-driven prediction of fluid flow and temperature distribution in marine and aerospace engineering has received extensive research and demonstrated its potential in real-time prediction recently. However, usually large amounts of high-fidelity data are required to describe and accurately predict the complex physical information, while in reality, only limited high-fidelity data is available due to the high experiment/computational cost. Therefore, this work proposes a novel multi-fidelity learning method based on the Fourier Neural Operator by jointing abundant low-fidelity data and limited high-fidelity data under transfer learning paradigm. First, as a resolution-invariant operator, the Fourier Neural Operator is first and gainfully applied to integrate multi-fidelity data directly, which can utilize the scarce high-fidelity data and abundant low-fidelity data simultaneously. Then, the transfer learning framework is developed for the current task by extracting the rich low-fidelity data knowledge to assist high-fidelity modeling training, to further improve data-driven prediction accuracy. Finally, three typical fluid and temperature prediction problems are chosen to validate the accuracy of the proposed multi-fidelity model. The results demonstrate that our proposed method has high effectiveness when compared with other high-fidelity models, and has the high modeling accuracy of 99% for all the selected physical field problems. Significantly, the proposed multi-fidelity learning method has the potential of a simple structure with high precision, which can provide a reference for the construction of the subsequent model.

LGJan 17, 2023
Multi-fidelity surrogate modeling for temperature field prediction using deep convolution neural network

Yunyang Zhang, Zhiqiang Gong, Weien Zhou et al.

Temperature field prediction is of great importance in the thermal design of systems engineering, and building the surrogate model is an effective way for the task. Generally, large amounts of labeled data are required to guarantee a good prediction performance of the surrogate model, especially the deep learning model, which have more parameters and better representational ability. However, labeled data, especially high-fidelity labeled data, are usually expensive to obtain and sometimes even impossible. To solve this problem, this paper proposes a pithy deep multi-fidelity model (DMFM) for temperature field prediction, which takes advantage of low-fidelity data to boost the performance with less high-fidelity data. First, a pre-train and fine-tune paradigm are developed in DMFM to train the low-fidelity and high-fidelity data, which significantly reduces the complexity of the deep surrogate model. Then, a self-supervised learning method for training the physics-driven deep multi-fidelity model (PD-DMFM) is proposed, which fully utilizes the physics characteristics of the engineering systems and reduces the dependence on large amounts of labeled low-fidelity data in the training process. Two diverse temperature field prediction problems are constructed to validate the effectiveness of DMFM and PD-DMFM, and the result shows that the proposed method can greatly reduce the dependence of the model on high-fidelity data.

CVApr 21, 2023
Adversarial Infrared Blocks: A Multi-view Black-box Attack to Thermal Infrared Detectors in Physical World

Chengyin Hu, Weiwen Shi, Tingsong Jiang et al.

Infrared imaging systems have a vast array of potential applications in pedestrian detection and autonomous driving, and their safety performance is of great concern. However, few studies have explored the safety of infrared imaging systems in real-world settings. Previous research has used physical perturbations such as small bulbs and thermal "QR codes" to attack infrared imaging detectors, but such methods are highly visible and lack stealthiness. Other researchers have used hot and cold blocks to deceive infrared imaging detectors, but this method is limited in its ability to execute attacks from various angles. To address these shortcomings, we propose a novel physical attack called adversarial infrared blocks (AdvIB). By optimizing the physical parameters of the adversarial infrared blocks, this method can execute a stealthy black-box attack on thermal imaging system from various angles. We evaluate the proposed method based on its effectiveness, stealthiness, and robustness. Our physical tests show that the proposed method achieves a success rate of over 80% under most distance and angle conditions, validating its effectiveness. For stealthiness, our method involves attaching the adversarial infrared block to the inside of clothing, enhancing its stealthiness. Additionally, we test the proposed method on advanced detectors, and experimental results demonstrate an average attack success rate of 51.2%, proving its robustness. Overall, our proposed AdvIB method offers a promising avenue for conducting stealthy, effective and robust black-box attacks on thermal imaging system, with potential implications for real-world safety and security applications.

53.9LGMay 26
MTL-FNO: A Lightweight Multi-Task Fourier Neural Operator for Sparse Field Reconstruction

Siyu Ye, Shihang Li, Zhiqiang Gong et al.

Efficient onboard multi-field sparse reconstruction is essential for the autonomous operation of aerospace vehicles. While existing deep learning models exhibit promise for single-field reconstruction, deploying multiple independent models leads to prohibitive model size growth and fails to exploit cross-field correlations, particularly under few-shot conditions. To address these challenges, we first propose a lightweight multi-task Fourier neural operator (MTL-FNO), an end-to-end joint training framework based on hard parameter sharing. In each layer, the parameters are divided into shared and task-specific components to capture common features across fields while preserving task-specific characteristics. Moreover, the task-specific fine-tuning parameters are implemented as low-rank terms, achieving substantial model compression. Second, to address the difficulty of co-optimizing shared and task-specific parameters along with their real and imaginary parts, we revisit the FNO's spectral weight from a polar-form perspective and devise a physically meaningful decoupled optimization scheme. Specifically, we apply polar decomposition to slice-wise disentangle the spectral weight into a unitary tensor encoding phase information and a positive semi-definite tensor characterizing amplitude. By decoupling the optimization of phase and amplitude, our method can effectively mitigate tasks conflict. Meanwhile, to preserve unitary geometric fidelity during training, the Cayley transform is introduced to reparameterize the unitary tensor, converting the constrained optimization problem to an unconstrained one. Finally, the effectiveness of the proposed method under few-shot conditions is validated on two representative engineering cases. Results show that MTL-FNO achieves accuracy comparable to or even surpassing that of standard FNO, while reducing total model size by 76% and 60%, respectively.

LGSep 27, 2024
A physics-driven sensor placement optimization methodology for temperature field reconstruction

Xu Liu, Wen Yao, Wei Peng et al.

Perceiving the global field from sparse sensors has been a grand challenge in the monitoring, analysis, and design of physical systems. In this context, sensor placement optimization is a crucial issue. Most existing works require large and sufficient data to construct data-based criteria, which are intractable in data-free scenarios without numerical and experimental data. To this end, we propose a novel physics-driven sensor placement optimization (PSPO) method for temperature field reconstruction using a physics-based criterion to optimize sensor locations. In our methodological framework, we firstly derive the theoretical upper and lower bounds of the reconstruction error under noise scenarios by analyzing the optimal solution, proving that error bounds correlate with the condition number determined by sensor locations. Furthermore, the condition number, as the physics-based criterion, is used to optimize sensor locations by the genetic algorithm. Finally, the best sensors are validated by reconstruction models, including non-invasive end-to-end models, non-invasive reduced-order models, and physics-informed models. Experimental results, both on a numerical and an application case, demonstrate that the PSPO method significantly outperforms random and uniform selection methods, improving the reconstruction accuracy by nearly an order of magnitude. Moreover, the PSPO method can achieve comparable reconstruction accuracy to the existing data-driven placement optimization methods.

LGApr 4, 2022
Algorithms for Bayesian network modeling and reliability inference of complex multistate systems: Part II-Dependent systems

Xiaohu Zheng, Wen Yao, Xiaoqian Chen

In using the Bayesian network (BN) to construct the complex multistate system's reliability model as described in Part I, the memory storage requirements of the node probability table (NPT) will exceed the random access memory (RAM) of the computer. However, the proposed inference algorithm of Part I is not suitable for the dependent system. This Part II proposes a novel method for BN reliability modeling and analysis to apply the compression idea to the complex multistate dependent system. In this Part II, the dependent nodes and their parent nodes are equivalent to a block, based on which the multistate joint probability inference algorithm is proposed to calculate the joint probability distribution of a block's all nodes. Then, based on the proposed multistate compression algorithm of Part I, the dependent multistate inference algorithm is proposed for the complex multistate dependent system. The use and accuracy of the proposed algorithms are demonstrated in case 1. Finally, the proposed algorithms are applied to the reliability modeling and analysis of the satellite attitude control system. The results show that both Part I and Part II's proposed algorithms make the reliability modeling and analysis of the complex multistate system feasible.

LGMar 29, 2022
Consistency regularization-based Deep Polynomial Chaos Neural Network Method for Reliability Analysis

Xiaohu Zheng, Wen Yao, Yunyang Zhang et al.

Polynomial chaos expansion (PCE) is a powerful surrogate model-based reliability analysis method. Generally, a PCE model with a higher expansion order is usually required to obtain an accurate surrogate model for some complex non-linear stochastic systems. However, the high-order PCE increases the number of labeled data required for solving the expansion coefficients. To alleviate this problem, this paper proposes a consistency regularization-based deep polynomial chaos neural network (Deep PCNN) method, including the low-order adaptive PCE model (the auxiliary model) and the high-order polynomial chaos neural network (the main model). The expansion coefficients of the main model are parameterized into the learnable weights of the polynomial chaos neural network, realizing iterative learning of expansion coefficients to obtain more accurate high-order PCE models. The auxiliary model uses a proposed consistency regularization loss function to assist in training the main model. The consistency regularization-based Deep PCNN method can significantly reduce the number of labeled data in constructing a high-order PCE model without losing accuracy by using few labeled data and abundant unlabeled data. A numerical example validates the effectiveness of the consistency regularization-based Deep PCNN method, and then this method is applied to analyze the reliability of two aerospace engineering systems.

RODec 9, 2025Code
RAVES-Calib: Robust, Accurate and Versatile Extrinsic Self Calibration Using Optimal Geometric Features

Haoxin Zhang, Shuaixin Li, Xiaozhou Zhu et al.

In this paper, we present a user-friendly LiDAR-camera calibration toolkit that is compatible with various LiDAR and camera sensors and requires only a single pair of laser points and a camera image in targetless environments. Our approach eliminates the need for an initial transform and remains robust even with large positional and rotational LiDAR-camera extrinsic parameters. We employ the Gluestick pipeline to establish 2D-3D point and line feature correspondences for a robust and automatic initial guess. To enhance accuracy, we quantitatively analyze the impact of feature distribution on calibration results and adaptively weight the cost of each feature based on these metrics. As a result, extrinsic parameters are optimized by filtering out the adverse effects of inferior features. We validated our method through extensive experiments across various LiDAR-camera sensors in both indoor and outdoor settings. The results demonstrate that our method provides superior robustness and accuracy compared to SOTA techniques. Our code is open-sourced on GitHub to benefit the community.

CVJul 9, 2024
Improving the Transferability of Adversarial Examples by Feature Augmentation

Donghua Wang, Wen Yao, Tingsong Jiang et al.

Despite the success of input transformation-based attacks on boosting adversarial transferability, the performance is unsatisfying due to the ignorance of the discrepancy across models. In this paper, we propose a simple but effective feature augmentation attack (FAUG) method, which improves adversarial transferability without introducing extra computation costs. Specifically, we inject the random noise into the intermediate features of the model to enlarge the diversity of the attack gradient, thereby mitigating the risk of overfitting to the specific model and notably amplifying adversarial transferability. Moreover, our method can be combined with existing gradient attacks to augment their performance further. Extensive experiments conducted on the ImageNet dataset across CNN and transformer models corroborate the efficacy of our method, e.g., we achieve improvement of +26.22% and +5.57% on input transformation-based attacks and combination methods, respectively.

CVApr 20, 2023
A Plug-and-Play Defensive Perturbation for Copyright Protection of DNN-based Applications

Donghua Wang, Wen Yao, Tingsong Jiang et al.

Wide deployment of deep neural networks (DNNs) based applications (e.g., style transfer, cartoonish), stimulating the requirement of copyright protection of such application's production. Although some traditional visible copyright techniques are available, they would introduce undesired traces and result in a poor user experience. In this paper, we propose a novel plug-and-play invisible copyright protection method based on defensive perturbation for DNN-based applications (i.e., style transfer). Rather than apply the perturbation to attack the DNNs model, we explore the potential utilization of perturbation in copyright protection. Specifically, we project the copyright information to the defensive perturbation with the designed copyright encoder, which is added to the image to be protected. Then, we extract the copyright information from the encoded copyrighted image with the devised copyright decoder. Furthermore, we use a robustness module to strengthen the decoding capability of the decoder toward images with various distortions (e.g., JPEG compression), which may be occurred when the user posts the image on social media. To ensure the image quality of encoded images and decoded copyright images, a loss function was elaborately devised. Objective and subjective experiment results demonstrate the effectiveness of the proposed method. We have also conducted physical world tests on social media (i.e., Wechat and Twitter) by posting encoded copyright images. The results show that the copyright information in the encoded image saved from social media can still be correctly extracted.

CVMar 10, 2022
Semi-supervision semantic segmentation with uncertainty-guided self cross supervision

Yunyang Zhang, Zhiqiang Gong, Xiaohu Zheng et al.

As a powerful way of realizing semi-supervised segmentation, the cross supervision method learns cross consistency based on independent ensemble models using abundant unlabeled images. However, the wrong pseudo labeling information generated by cross supervision would confuse the training process and negatively affect the effectiveness of the segmentation model. Besides, the training process of ensemble models in such methods also multiplies the cost of computation resources and decreases the training efficiency. To solve these problems, we propose a novel cross supervision method, namely uncertainty-guided self cross supervision (USCS). In addition to ensemble models, we first design a multi-input multi-output (MIMO) segmentation model which can generate multiple outputs with shared model and consequently impose consistency over the outputs, saving the cost on parameters and calculations. On the other hand, we employ uncertainty as guided information to encourage the model to focus on the high confident regions of pseudo labels and mitigate the effects of wrong pseudo labeling in self cross supervision, improving the performance of the segmentation model. Extensive experiments show that our method achieves state-of-the-art performance while saving 40.5% and 49.1% cost on parameters and calculations.

ROAug 2, 2024
HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling

YiFan Hao, Yang Yang, Junru Song et al.

In the field of robotic control, designing individual controllers for each robot leads to high computational costs. Universal control policies, applicable across diverse robot morphologies, promise to mitigate this challenge. Predominantly, models based on Graph Neural Networks (GNN) and Transformers are employed, owing to their effectiveness in capturing relational dynamics across a robot's limbs. However, these models typically employ homogeneous graph structures that overlook the functional diversity of different limbs. To bridge this gap, we introduce HeteroMorpheus, a novel method based on heterogeneous graph Transformer. This method uniquely addresses limb heterogeneity, fostering better representation of robot dynamics of various morphologies. Through extensive experiments we demonstrate the superiority of HeteroMorpheus against state-of-the-art methods in the capability of policy generalization, including zero-shot generalization and sample-efficient transfer to unfamiliar robot morphologies.

LGFeb 23, 2023
Uncertainty Guided Ensemble Self-Training for Semi-Supervised Global Field Reconstruction

Yunyang Zhang, Zhiqiang Gong, Xiaoyu Zhao et al.

Recovering a globally accurate complex physics field from limited sensor is critical to the measurement and control in the aerospace engineering. General reconstruction methods for recovering the field, especially the deep learning with more parameters and better representational ability, usually require large amounts of labeled data which is unaffordable. To solve the problem, this paper proposes Uncertainty Guided Ensemble Self-Training (UGE-ST), using plentiful unlabeled data to improve reconstruction performance. A novel self-training framework with the ensemble teacher and pretraining student designed to improve the accuracy of the pseudo-label and remedy the impact of noise is first proposed. On the other hand, uncertainty-guided learning is proposed to encourage the model to focus on the highly confident regions of pseudo-labels and mitigate the effects of wrong pseudo-labeling in self-training, improving the performance of the reconstruction model. Experiments include the pressure velocity field reconstruction of airfoil and the temperature field reconstruction of aircraft system indicate that our UGE-ST can save up to 90% of the data with the same accuracy as supervised learning.

CVOct 28, 2023
MultiScale Spectral-Spatial Convolutional Transformer for Hyperspectral Image Classification

Zhiqiang Gong, Xian Zhou, Wen Yao

Due to the powerful ability in capturing the global information, Transformer has become an alternative architecture of CNNs for hyperspectral image classification. However, general Transformer mainly considers the global spectral information while ignores the multiscale spatial information of the hyperspectral image. In this paper, we propose a multiscale spectral-spatial convolutional Transformer (MultiscaleFormer) for hyperspectral image classification. First, the developed method utilizes multiscale spatial patches as tokens to formulate the spatial Transformer and generates multiscale spatial representation of each band in each pixel. Second, the spatial representation of all the bands in a given pixel are utilized as tokens to formulate the spectral Transformer and generate the multiscale spectral-spatial representation of each pixel. Besides, a modified spectral-spatial CAF module is constructed in the MultiFormer to fuse cross-layer spectral and spatial information. Therefore, the proposed MultiFormer can capture the multiscale spectral-spatial information and provide better performance than most of other architectures for hyperspectral image classification. Experiments are conducted over commonly used real-world datasets and the comparison results show the superiority of the proposed method.

CVJul 9, 2024
Universal Multi-view Black-box Attack against Object Detectors via Layout Optimization

Donghua Wang, Wen Yao, Tingsong Jiang et al.

Object detectors have demonstrated vulnerability to adversarial examples crafted by small perturbations that can deceive the object detector. Existing adversarial attacks mainly focus on white-box attacks and are merely valid at a specific viewpoint, while the universal multi-view black-box attack is less explored, limiting their generalization in practice. In this paper, we propose a novel universal multi-view black-box attack against object detectors, which optimizes a universal adversarial UV texture constructed by multiple image stickers for a 3D object via the designed layout optimization algorithm. Specifically, we treat the placement of image stickers on the UV texture as a circle-based layout optimization problem, whose objective is to find the optimal circle layout filled with image stickers so that it can deceive the object detector under the multi-view scenario. To ensure reasonable placement of image stickers, two constraints are elaborately devised. To optimize the layout, we adopt the random search algorithm enhanced by the devised important-aware selection strategy to find the most appropriate image sticker for each circle from the image sticker pools. Extensive experiments conducted on four common object detectors suggested that the detection performance decreases by a large magnitude of 74.29% on average in multi-view scenarios. Additionally, a novel evaluation tool based on the photo-realistic simulator is designed to assess the texture-based attack fairly.

LGDec 6, 2022
RBF-MGN:Solving spatiotemporal PDEs with Physics-informed Graph Neural Network

Zixue Xiang, Wei Peng, Wen Yao

Physics-informed neural networks (PINNs) have lately received significant attention as a representative deep learning-based technique for solving partial differential equations (PDEs). Most fully connected network-based PINNs use automatic differentiation to construct loss functions that suffer from slow convergence and difficult boundary enforcement. In addition, although convolutional neural network (CNN)-based PINNs can significantly improve training efficiency, CNNs have difficulty in dealing with irregular geometries with unstructured meshes. Therefore, we propose a novel framework based on graph neural networks (GNNs) and radial basis function finite difference (RBF-FD). We introduce GNNs into physics-informed learning to better handle irregular domains with unstructured meshes. RBF-FD is used to construct a high-precision difference format of the differential equations to guide model training. Finally, we perform numerical experiments on Poisson and wave equations on irregular domains. We illustrate the generalizability, accuracy, and efficiency of the proposed algorithms on different PDE parameters, numbers of collection points, and several types of RBFs.

CRNov 3, 2023
Universal Perturbation-based Secret Key-Controlled Data Hiding

Donghua Wang, Wen Yao, Tingsong Jiang et al.

Deep neural networks (DNNs) are demonstrated to be vulnerable to universal perturbation, a single quasi-perceptible perturbation that can deceive the DNN on most images. However, the previous works are focused on using universal perturbation to perform adversarial attacks, while the potential usability of universal perturbation as data carriers in data hiding is less explored, especially for the key-controlled data hiding method. In this paper, we propose a novel universal perturbation-based secret key-controlled data-hiding method, realizing data hiding with a single universal perturbation and data decoding with the secret key-controlled decoder. Specifically, we optimize a single universal perturbation, which serves as a data carrier that can hide multiple secret images and be added to most cover images. Then, we devise a secret key-controlled decoder to extract different secret images from the single container image constructed by the universal perturbation by using different secret keys. Moreover, a suppress loss function is proposed to prevent the secret image from leakage. Furthermore, we adopt a robust module to boost the decoder's capability against corruption. Finally, A co-joint optimization strategy is proposed to find the optimal universal perturbation and decoder. Extensive experiments are conducted on different datasets to demonstrate the effectiveness of the proposed method. Additionally, the physical test performed on platforms (e.g., WeChat and Twitter) verifies the usability of the proposed method in practice.

CVJul 13, 2023
Multi-objective Evolutionary Search of Variable-length Composite Semantic Perturbations

Jialiang Sun, Wen Yao, Tingsong Jiang et al.

Deep neural networks have proven to be vulnerable to adversarial attacks in the form of adding specific perturbations on images to make wrong outputs. Designing stronger adversarial attack methods can help more reliably evaluate the robustness of DNN models. To release the harbor burden and improve the attack performance, auto machine learning (AutoML) has recently emerged as one successful technique to help automatically find the near-optimal adversarial attack strategy. However, existing works about AutoML for adversarial attacks only focus on $L_{\infty}$-norm-based perturbations. In fact, semantic perturbations attract increasing attention due to their naturalnesses and physical realizability. To bridge the gap between AutoML and semantic adversarial attacks, we propose a novel method called multi-objective evolutionary search of variable-length composite semantic perturbations (MES-VCSP). Specifically, we construct the mathematical model of variable-length composite semantic perturbations, which provides five gradient-based semantic attack methods. The same type of perturbation in an attack sequence is allowed to be performed multiple times. Besides, we introduce the multi-objective evolutionary search consisting of NSGA-II and neighborhood search to find near-optimal variable-length attack sequences. Experimental results on CIFAR10 and ImageNet datasets show that compared with existing methods, MES-VCSP can obtain adversarial examples with a higher attack success rate, more naturalness, and less time cost.

CVAug 15, 2022
A Multi-objective Memetic Algorithm for Auto Adversarial Attack Optimization Design

Jialiang Sun, Wen Yao, Tingsong Jiang et al.

The phenomenon of adversarial examples has been revealed in variant scenarios. Recent studies show that well-designed adversarial defense strategies can improve the robustness of deep learning models against adversarial examples. However, with the rapid development of defense technologies, it also tends to be more difficult to evaluate the robustness of the defensed model due to the weak performance of existing manually designed adversarial attacks. To address the challenge, given the defensed model, the efficient adversarial attack with less computational burden and lower robust accuracy is needed to be further exploited. Therefore, we propose a multi-objective memetic algorithm for auto adversarial attack optimization design, which realizes the automatical search for the near-optimal adversarial attack towards defensed models. Firstly, the more general mathematical model of auto adversarial attack optimization design is constructed, where the search space includes not only the attacker operations, magnitude, iteration number, and loss functions but also the connection ways of multiple adversarial attacks. In addition, we develop a multi-objective memetic algorithm combining NSGA-II and local search to solve the optimization problem. Finally, to decrease the evaluation cost during the search, we propose a representative data selection strategy based on the sorting of cross entropy loss values of each images output by models. Experiments on CIFAR10, CIFAR100, and ImageNet datasets show the effectiveness of our proposed method.

LGDec 1, 2025
Learning to Reconstruct Temperature Field from Sparse Observations with Implicit Physics Priors

Shihang Li, Zhiqiang Gong, Weien Zhou et al.

Accurate reconstruction of temperature field of heat-source systems (TFR-HSS) is crucial for thermal monitoring and reliability assessment in engineering applications such as electronic devices and aerospace structures. However, the high cost of measurement acquisition and the substantial distributional shifts in temperature field across varying conditions present significant challenges for developing reconstruction models with robust generalization capabilities. Existing DNNs-based methods typically formulate TFR-HSS as a one-to-one regression problem based solely on target sparse measurements, without effectively leveraging reference simulation data that implicitly encode thermal knowledge. To address this limitation, we propose IPTR, an implicit physics-guided temperature field reconstruction framework that introduces sparse monitoring-temperature field pair from reference simulations as priors to enrich physical understanding. To integrate both reference and target information, we design a dual physics embedding module consisting of two complementary branches: an implicit physics-guided branch employing cross-attention to distill latent physics from the reference data, and an auxiliary encoding branch based on Fourier layers to capture the spatial characteristics of the target observation. The fused representation is then decoded to reconstruct the full temperature field. Extensive experiments under single-condition, multi-condition, and few-shot settings demonstrate that IPTR consistently outperforms existing methods, achieving state-of-the-art reconstruction accuracy and strong generalization capability.

AINov 13, 2025
Thermally Activated Dual-Modal Adversarial Clothing against AI Surveillance Systems

Jiahuan Long, Tingsong Jiang, Hanqing Liu et al.

Adversarial patches have emerged as a popular privacy-preserving approach for resisting AI-driven surveillance systems. However, their conspicuous appearance makes them difficult to deploy in real-world scenarios. In this paper, we propose a thermally activated adversarial wearable designed to ensure adaptability and effectiveness in complex real-world environments. The system integrates thermochromic dyes with flexible heating units to induce visually dynamic adversarial patterns on clothing surfaces. In its default state, the clothing appears as an ordinary black T-shirt. Upon heating via an embedded thermal unit, hidden adversarial patterns on the fabric are activated, allowing the wearer to effectively evade detection across both visible and infrared modalities. Physical experiments demonstrate that the adversarial wearable achieves rapid texture activation within 50 seconds and maintains an adversarial success rate above 80\% across diverse real-world surveillance environments. This work demonstrates a new pathway toward physically grounded, user-controllable anti-AI systems, highlighting the growing importance of proactive adversarial techniques for privacy protection in the age of ubiquitous AI surveillance.

39.3CVApr 14
Challenging Vision-Language Models with Physically Deployable Multimodal Semantic Lighting Attacks

Yingying Zhao, Chengyin Hu, Qike Zhang et al.

Vision-Language Models (VLMs) have shown remarkable performance, yet their security remains insufficiently understood. Existing adversarial studies focus almost exclusively on the digital setting, leaving physical-world threats largely unexplored. As VLMs are increasingly deployed in real environments, this gap becomes critical, since adversarial perturbations must be physically realizable. Despite this practical relevance, physical attacks against VLMs have not been systematically studied. Such attacks may induce recognition failures and further disrupt multimodal reasoning, leading to severe semantic misinterpretation in downstream tasks. Therefore, investigating physical attacks on VLMs is essential for assessing their real-world security risks. To address this gap, we propose Multimodal Semantic Lighting Attacks (MSLA), the first physically deployable adversarial attack framework against VLMs. MSLA uses controllable adversarial lighting to disrupt multimodal semantic understanding in real scenes, attacking semantic alignment rather than only task-specific outputs. Consequently, it degrades zero-shot classification performance of mainstream CLIP variants while inducing severe semantic hallucinations in advanced VLMs such as LLaVA and BLIP across image captioning and visual question answering (VQA). Extensive experiments in both digital and physical domains demonstrate that MSLA is effective, transferable, and practically realizable. Our findings provide the first evidence that VLMs are highly vulnerable to physically deployable semantic attacks, exposing a previously overlooked robustness gap and underscoring the urgent need for physical-world robustness evaluation of VLMs.

CVDec 19, 2025
SynergyWarpNet: Attention-Guided Cooperative Warping for Neural Portrait Animation

Shihang Li, Zhiqiang Gong, Minming Ye et al.

Recent advances in neural portrait animation have demonstrated remarked potential for applications in virtual avatars, telepresence, and digital content creation. However, traditional explicit warping approaches often struggle with accurate motion transfer or recovering missing regions, while recent attention-based warping methods, though effective, frequently suffer from high complexity and weak geometric grounding. To address these issues, we propose SynergyWarpNet, an attention-guided cooperative warping framework designed for high-fidelity talking head synthesis. Given a source portrait, a driving image, and a set of reference images, our model progressively refines the animation in three stages. First, an explicit warping module performs coarse spatial alignment between the source and driving image using 3D dense optical flow. Next, a reference-augmented correction module leverages cross-attention across 3D keypoints and texture features from multiple reference images to semantically complete occluded or distorted regions. Finally, a confidence-guided fusion module integrates the warped outputs with spatially-adaptive fusing, using a learned confidence map to balance structural alignment and visual consistency. Comprehensive evaluations on benchmark datasets demonstrate state-of-the-art performance.

AIFeb 17, 2025Code
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration

Shao Zhang, Xihuai Wang, Wenhao Zhang et al.

Agents built on large language models (LLMs) have excelled in turn-by-turn human-AI collaboration but struggle with simultaneous tasks requiring real-time interaction. Latency issues and the challenge of inferring variable human strategies hinder their ability to make autonomous decisions without explicit instructions. Through experiments with current independent System 1 and System 2 methods, we validate the necessity of using Dual Process Theory (DPT) in real-time tasks. We propose DPT-Agent, a novel language agent framework that integrates System 1 and System 2 for efficient real-time simultaneous human-AI collaboration. DPT-Agent's System 1 uses a Finite-state Machine (FSM) and code-as-policy for fast, intuitive, and controllable decision-making. DPT-Agent's System 2 integrates Theory of Mind (ToM) and asynchronous reflection to infer human intentions and perform reasoning-based autonomous decisions. We demonstrate the effectiveness of DPT-Agent through further experiments with rule-based agents and human collaborators, showing significant improvements over mainstream LLM-based frameworks. DPT-Agent can effectively help LLMs convert correct slow thinking and reasoning into executable actions, thereby improving performance. To the best of our knowledge, DPT-Agent is the first language agent framework that achieves successful real-time simultaneous human-AI collaboration autonomously. Code of DPT-Agent can be found in https://github.com/sjtu-marl/DPT-Agent.

52.9ROApr 9
Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models

Xingyu Xia, Lekai Zhou, Yujie Tang et al.

Aerial vision-and-language navigation (Aerial VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and autonomously navigate complex three-dimensional environments by grounding language in visual perception. This survey provides a critical and analytical review of the Aerial VLN field, with particular attention to the recent integration of large language models (LLMs) and vision-language models (VLMs). We first formally introduce the Aerial VLN problem and define two interaction paradigms: single-instruction and dialog-based, as foundational axes. We then organize the body of Aerial VLN methods into a taxonomy of five architectural categories: sequence-to-sequence and attention-based methods, end-to-end LLM/VLM methods, hierarchical methods, multi-agent methods, and dialog-based navigation methods. For each category, we systematically analyze design rationales, technical trade-offs, and reported performance. We critically assess the evaluation infrastructure for Aerial VLN, including datasets, simulation platforms, and metrics, and identify their gaps in scale, environmental diversity, real-world grounding, and metric coverage. We consolidate cross-method comparisons on shared benchmarks and analyze key architectural trade-offs, including discrete versus continuous actions, end-to-end versus hierarchical designs, and the simulation-to-reality gap. Finally, we synthesize seven concrete open problems: long-horizon instruction grounding, viewpoint robustness, scalable spatial representation, continuous 6-DoF action execution, onboard deployment, benchmark standardization, and multi-UAV swarm navigation, with specific research directions grounded in the evidence presented throughout the survey.

CLJan 29
$G^2$-Reader: Dual Evolving Graphs for Multimodal Document QA

Yaxin Du, Junru Song, Yifan Zhou et al.

Retrieval-augmented generation is a practical paradigm for question answering over long documents, but it remains brittle for multimodal reading where text, tables, and figures are interleaved across many pages. First, flat chunking breaks document-native structure and cross-modal alignment, yielding semantic fragments that are hard to interpret in isolation. Second, even iterative retrieval can fail in long contexts by looping on partial evidence or drifting into irrelevant sections as noise accumulates, since each step is guided only by the current snippet without a persistent global search state. We introduce $G^2$-Reader, a dual-graph system, to address both issues. It evolves a Content Graph to preserve document-native structure and cross-modal semantics, and maintains a Planning Graph, an agentic directed acyclic graph of sub-questions, to track intermediate findings and guide stepwise navigation for evidence completion. On VisDoMBench across five multimodal domains, $G^2$-Reader with Qwen3-VL-32B-Instruct reaches 66.21\% average accuracy, outperforming strong baselines and a standalone GPT-5 (53.08\%).

LGMay 12, 2025Code
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts

Hongkun Dou, Zeyu Li, Xingyu Jiang et al.

Diffusion models (DMs) have recently demonstrated remarkable success in modeling large-scale data distributions. However, many downstream tasks require guiding the generated content based on specific differentiable metrics, typically necessitating backpropagation during the generation process. This approach is computationally expensive, as generating with DMs often demands tens to hundreds of recursive network calls, resulting in high memory usage and significant time consumption. In this paper, we propose a more efficient alternative that approaches the problem from the perspective of parallel denoising. We show that full backpropagation throughout the entire generation process is unnecessary. The downstream metrics can be optimized by retaining the computational graph of only one step during generation, thus providing a shortcut for gradient propagation. The resulting method, which we call Shortcut Diffusion Optimization (SDO), is generic, high-performance, and computationally lightweight, capable of optimizing all parameter types in diffusion sampling. We demonstrate the effectiveness of SDO on several real-world tasks, including controlling generation by optimizing latent and aligning the DMs by fine-tuning network parameters. Compared to full backpropagation, our approach reduces computational costs by $\sim 90\%$ while maintaining superior performance. Code is available at https://github.com/deng-ai-lab/SDO.

LGMar 7, 2022Code
$A^{3}D$: A Platform of Searching for Robust Neural Architectures and Efficient Adversarial Attacks

Jialiang Sun, Wen Yao, Tingsong Jiang et al.

The robustness of deep neural networks (DNN) models has attracted increasing attention due to the urgent need for security in many applications. Numerous existing open-sourced tools or platforms are developed to evaluate the robustness of DNN models by ensembling the majority of adversarial attack or defense algorithms. Unfortunately, current platforms do not possess the ability to optimize the architectures of DNN models or the configuration of adversarial attacks to further enhance the robustness of models or the performance of adversarial attacks. To alleviate these problems, in this paper, we first propose a novel platform called auto adversarial attack and defense ($A^{3}D$), which can help search for robust neural network architectures and efficient adversarial attacks. In $A^{3}D$, we employ multiple neural architecture search methods, which consider different robustness evaluation metrics, including four types of noises: adversarial noise, natural noise, system noise, and quantified metrics, resulting in finding robust architectures. Besides, we propose a mathematical model for auto adversarial attack, and provide multiple optimization algorithms to search for efficient adversarial attacks. In addition, we combine auto adversarial attack and defense together to form a unified framework. Among auto adversarial defense, the searched efficient attack can be used as the new robustness evaluation to further enhance the robustness. In auto adversarial attack, the searched robust architectures can be utilized as the threat model to help find stronger adversarial attacks. Experiments on CIFAR10, CIFAR100, and ImageNet datasets demonstrate the feasibility and effectiveness of the proposed platform, which can also provide a benchmark and toolkit for researchers in the application of automated machine learning in evaluating and improving the DNN model robustnesses.

LGJul 9, 2021Code
IDRLnet: A Physics-Informed Neural Network Library

Wei Peng, Jun Zhang, Weien Zhou et al.

Physics Informed Neural Network (PINN) is a scientific computing framework used to solve both forward and inverse problems modeled by Partial Differential Equations (PDEs). This paper introduces IDRLnet, a Python toolbox for modeling and solving problems through PINN systematically. IDRLnet constructs the framework for a wide range of PINN algorithms and applications. It provides a structured way to incorporate geometric objects, data sources, artificial neural networks, loss metrics, and optimizers within Python. Furthermore, it provides functionality to solve noisy inverse problems, variational minimization, and integral differential equations. New PINN variants can be integrated into the framework easily. Source code, tutorials, and documentation are available at \url{https://github.com/idrl-lab/idrlnet}.

96.8ROApr 7
JailWAM: Jailbreaking World Action Models in Robot Control

Hanqing Liu, Songping Wang, Jiahuan Long et al.

The World Action Model (WAM) can jointly predict future world states and actions, exhibiting stronger physical manipulation capabilities compared with traditional models. Such powerful physical interaction ability is a double-edged sword: if safety is ignored, it will directly threaten personal safety, property security and environmental safety. However, existing research pays extremely limited attention to the critical security gap: the vulnerability of WAM to jailbreak attacks. To fill this gap, we define the Three-Level Safety Classification Framework to systematically quantify the safety of robotic arm motions. Furthermore, we propose JailWAM, the first dedicated jailbreak attack and evaluation framework for WAM, which consists of three core components: (1) Visual-Trajectory Mapping, which unifies heterogeneous action spaces into visual trajectory representations and enables cross-architectural unified evaluation; (2) Risk Discriminator, which serves as a high-recall screening tool that optimizes the efficiency-accuracy trade-off when identifying destructive behaviors in visual trajectories; (3) Dual-Path Verification Strategy, which first conducts rapid coarse screening via a single-image-based video-action generation module, and then performs efficient and comprehensive verification through full closed-loop physical simulation. In addition, we construct JailWAM-Bench, a benchmark for comprehensively evaluating the safety alignment performance of WAM under jailbreak attacks. Experiments in RoboTwin simulation environment demonstrate that the proposed framework efficiently exposes physical vulnerabilities, achieving an 84.2% attack success rate on the state-of-the-art LingBot-VA. Meanwhile, robust defense mechanisms can be constructed based on JailWAM, providing an effective technical solution for designing safe and reliable robot control systems.

76.1CVApr 3
Revealing Physical-World Semantic Vulnerabilities: Universal Adversarial Patches for Infrared Vision-Language Models

Chengyin Hu, Yuxian Dong, Yikun Guo et al.

Infrared vision-language models (IR-VLMs) have emerged as a promising paradigm for multimodal perception in low-visibility environments, yet their robustness to adversarial attacks remains largely unexplored. Existing adversarial patch methods are mainly designed for RGB-based models in closed-set settings and are not readily applicable to the open-ended semantic understanding and physical deployment requirements of infrared VLMs. To bridge this gap, we propose Universal Curved-Grid Patch (UCGP), a universal physical adversarial patch framework for IR-VLMs. UCGP integrates Curved-Grid Mesh (CGM) parameterization for continuous, low-frequency, and deployable patch generation with a unified representation-driven objective that promotes subspace departure, topology disruption, and stealth. To improve robustness under real-world deployment and domain shift, we further incorporate Meta Differential Evolution and EOT-augmented TPS deformation modeling. Rather than manipulating labels or prompts, UCGP directly disrupts the visual representation space, weakening cross-modal semantic alignment. Extensive experiments demonstrate that UCGP consistently compromises semantic understanding across diverse IR-VLM architectures while maintaining cross-model transferability, cross-dataset generalization, real-world physical effectiveness, and robustness against defenses. These findings reveal a previously overlooked robustness vulnerability in current infrared multimodal systems.

ROSep 23, 2025
Eva-VLA: Evaluating Vision-Language-Action Models' Robustness Under Real-World Physical Variations

Hanqing Liu, Jiahuan Long, Junqi Wu et al.

Vision-Language-Action (VLA) models have emerged as promising solutions for robotic manipulation, yet their robustness to real-world physical variations remains critically underexplored. To bridge this gap, we propose Eva-VLA, the first unified framework that systematically evaluates the robustness of VLA models by transforming discrete physical variations into continuous optimization problems. However, comprehensively assessing VLA robustness presents two key challenges: (1) how to systematically characterize diverse physical variations encountered in real-world deployments while maintaining evaluation reproducibility, and (2) how to discover worst-case scenarios without prohibitive real-world data collection costs efficiently. To address the first challenge, we decompose real-world variations into three critical domains: object 3D transformations that affect spatial reasoning, illumination variations that challenge visual perception, and adversarial patches that disrupt scene understanding. For the second challenge, we introduce a continuous black-box optimization framework that transforms discrete physical variations into parameter optimization, enabling systematic exploration of worst-case scenarios. Extensive experiments on state-of-the-art OpenVLA models across multiple benchmarks reveal alarming vulnerabilities: all variation types trigger failure rates exceeding 60%, with object transformations causing up to 97.8% failure in long-horizon tasks. Our findings expose critical gaps between controlled laboratory success and unpredictable deployment readiness, while the Eva-VLA framework provides a practical pathway for hardening VLA-based robotic manipulation models against real-world deployment challenges.

CVApr 11, 2025
Robust SAM: On the Adversarial Robustness of Vision Foundation Models

Jiahuan Long, Zhengqin Xu, Tingsong Jiang et al.

The Segment Anything Model (SAM) is a widely used vision foundation model with diverse applications, including image segmentation, detection, and tracking. Given SAM's wide applications, understanding its robustness against adversarial attacks is crucial for real-world deployment. However, research on SAM's robustness is still in its early stages. Existing attacks often overlook the role of prompts in evaluating SAM's robustness, and there has been insufficient exploration of defense methods to balance the robustness and accuracy. To address these gaps, this paper proposes an adversarial robustness framework designed to evaluate and enhance the robustness of SAM. Specifically, we introduce a cross-prompt attack method to enhance the attack transferability across different prompt types. Besides attacking, we propose a few-parameter adaptation strategy to defend SAM against various adversarial attacks. To balance robustness and accuracy, we use the singular value decomposition (SVD) to constrain the space of trainable parameters, where only singular values are adaptable. Experiments demonstrate that our cross-prompt attack method outperforms previous approaches in terms of attack success rate on both SAM and SAM 2. By adapting only 512 parameters, we achieve at least a 15\% improvement in mean intersection over union (mIoU) against various adversarial attacks. Compared to previous defense methods, our approach enhances the robustness of SAM while maximally maintaining its original performance.

CVApr 12, 2025
PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking

Jiahuan Long, Tingsong Jiang, Wen Yao et al.

Tracking multiple objects in a continuous video stream is crucial for many computer vision tasks. It involves detecting and associating objects with their respective identities across successive frames. Despite significant progress made in multiple object tracking (MOT), recent studies have revealed the vulnerability of existing MOT methods to adversarial attacks. Nevertheless, all of these attacks belong to digital attacks that inject pixel-level noise into input images, and are therefore ineffective in physical scenarios. To fill this gap, we propose PapMOT, which can generate physical adversarial patches against MOT for both digital and physical scenarios. Besides attacking the detection mechanism, PapMOT also optimizes a printable patch that can be detected as new targets to mislead the identity association process. Moreover, we introduce a patch enhancement strategy to further degrade the temporal consistency of tracking results across video frames, resulting in more aggressive attacks. We further develop new evaluation metrics to assess the robustness of MOT against such attacks. Extensive evaluations on multiple datasets demonstrate that our PapMOT can successfully attack various architectures of MOT trackers in digital scenarios. We also validate the effectiveness of PapMOT for physical attacks by deploying printed adversarial patches in the real world.

CVApr 15, 2025
CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors

Jiahuan Long, Wen Yao, Tingsong Jiang et al.

Adversarial patches are widely used to evaluate the robustness of object detection systems in real-world scenarios. These patches were initially designed to deceive single-modal detectors (e.g., visible or infrared) and have recently been extended to target visible-infrared dual-modal detectors. However, existing dual-modal adversarial patch attacks have limited attack effectiveness across diverse physical scenarios. To address this, we propose CDUPatch, a universal cross-modal patch attack against visible-infrared object detectors across scales, views, and scenarios. Specifically, we observe that color variations lead to different levels of thermal absorption, resulting in temperature differences in infrared imaging. Leveraging this property, we propose an RGB-to-infrared adapter that maps RGB patches to infrared patches, enabling unified optimization of cross-modal patches. By learning an optimal color distribution on the adversarial patch, we can manipulate its thermal response and generate an adversarial infrared texture. Additionally, we introduce a multi-scale clipping strategy and construct a new visible-infrared dataset, MSDrone, which contains aerial vehicle images in varying scales and perspectives. These data augmentation strategies enhance the robustness of our patch in real-world conditions. Experiments on four benchmark datasets (e.g., DroneVehicle, LLVIP, VisDrone, MSDrone) show that our method outperforms existing patch attacks in the digital domain. Extensive physical tests further confirm strong transferability across scales, views, and scenarios.

LGMay 21, 2025
FR-Mamba: Time-Series Physical Field Reconstruction Based on State Space Model

Jiahuan Long, Wenzhe Zhang, Ning Wang et al.

Physical field reconstruction (PFR) aims to predict the state distribution of physical quantities (e.g., velocity, pressure, and temperature) based on limited sensor measurements. It plays a critical role in domains such as fluid dynamics and thermodynamics. However, existing deep learning methods often fail to capture long-range temporal dependencies, resulting in suboptimal performance on time-evolving physical systems. To address this, we propose FR-Mamba, a novel spatiotemporal flow field reconstruction framework based on state space modeling. Specifically, we design a hybrid neural network architecture that combines Fourier Neural Operator (FNO) and State Space Model (SSM) to capture both global spatial features and long-range temporal dependencies. We adopt Mamba, a recently proposed efficient SSM architecture, to model long-range temporal dependencies with linear time complexity. In parallel, the FNO is employed to capture non-local spatial features by leveraging frequency-domain transformations. The spatiotemporal representations extracted by these two components are then fused to reconstruct the full-field distribution of the physical system. Extensive experiments demonstrate that our approach significantly outperforms existing PFR methods in flow field reconstruction tasks, achieving high-accuracy performance on long sequences.

CVApr 11, 2025
Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models

Jiahuan Long, Tingsong Jiang, Wen Yao et al.

Vision foundation models (VFMs) have demonstrated remarkable capabilities in learning universal visual representations. However, adapting these models to downstream tasks conventionally requires parameter updates, with even parameter-efficient fine-tuning methods necessitating the modification of thousands to millions of weights. In this paper, we investigate the redundancies in the segment anything model (SAM) and then propose a novel parameter-free fine-tuning method. Unlike traditional fine-tuning methods that adjust parameters, our method emphasizes selecting, reusing, and enhancing pre-trained features, offering a new perspective on fine-tuning foundation models. Specifically, we introduce a channel selection algorithm based on the model's output difference to identify redundant and effective channels. By selectively replacing the redundant channels with more effective ones, we filter out less useful features and reuse more task-irrelevant features to downstream tasks, thereby enhancing the task-specific feature representation. Experiments on both out-of-domain and in-domain datasets demonstrate the efficiency and effectiveness of our method in different vision tasks (e.g., image segmentation, depth estimation and image classification). Notably, our approach can seamlessly integrate with existing fine-tuning strategies (e.g., LoRA, Adapter), further boosting the performance of already fine-tuned models. Moreover, since our channel selection involves only model inference, our method significantly reduces GPU memory overhead.