LGJan 29, 2023Code
Towards Verifying the Geometric Robustness of Large-scale Neural NetworksFu Wang, Peipei Xu, Wenjie Ruan et al.
Deep neural networks (DNNs) are known to be vulnerable to adversarial geometric transformation. This paper aims to verify the robustness of large-scale DNNs against the combination of multiple geometric transformations with a provable guarantee. Given a set of transformations (e.g., rotation, scaling, etc.), we develop GeoRobust, a black-box robustness analyser built upon a novel global optimisation strategy, for locating the worst-case combination of transformations that affect and even alter a network's output. GeoRobust can provide provable guarantees on finding the worst-case combination based on recent advances in Lipschitzian theory. Due to its black-box nature, GeoRobust can be deployed on large-scale DNNs regardless of their architectures, activation functions, and the number of neurons. In practice, GeoRobust can locate the worst-case geometric transformation with high precision for the ResNet50 model on ImageNet in a few seconds on average. We examined 18 ImageNet classifiers, including the ResNet family and vision transformers, and found a positive correlation between the geometric robustness of the networks and the parameter numbers. We also observe that increasing the depth of DNN is more beneficial than increasing its width in terms of improving its geometric robustness. Our tool GeoRobust is available at https://github.com/TrustAI/GeoRobust.
CVJun 3
Disentangled Fine-Grained Prototype Learning for Incomplete Image-Tabular ClassificationFeixiang Zhou, Jianyang Xie, Zhuangzhi Gao et al.
The missing-modality problem poses a significant challenge in image-tabular multimodal learning across a wide range of multimedia applications, including product understanding, recommendation systems, and medical diagnosis. This challenge is particularly pronounced when the two modalities are highly heterogeneous, as images and tabular attributes differ substantially in their semantic granularity and data distributions. Existing methods learn modality-invariant representations through disentanglement and alignment over global token-averaged features, capturing only coarse cross-modal consistency and overlooking fine-grained semantic and distributional misalignment, which hampers the exploitation of complementary cues under missing modalities. To address this, we propose DFPL, a novel framework for fine-grained prototype learning. Specifically, Shared-Specific Prototype Modeling (SSPM) extracts compact and diverse shared and modality-specific prototypes, and further performs prototype-level disentanglement to suppress redundant intra-modality correlations. Additionally, we propose a Prototype-guided Fine-grained Alignment (PFA) module that jointly enforces prototype-level distribution matching and prototype-to-class semantic alignment within a unified prototype space, thereby preserving both fine-grained distributional and semantic consistency across modalities. We further introduce a Class-aware Multi-scale Aggregation (CMA) module to adaptively aggregate shared semantics and modality-specific characteristics from global and prototype levels for robust predictions. Extensive experiments on three diverse image-tabular benchmarks demonstrate the superiority of our method compared to the previous approaches under various missing-modality settings. Code will be made publicly available.
GEO-PHJun 22, 2023
A prior regularized full waveform inversion using generative diffusion modelsFu Wang, Xinquan Huang, Tariq Alkhalifah
Full waveform inversion (FWI) has the potential to provide high-resolution subsurface model estimations. However, due to limitations in observation, e.g., regional noise, limited shots or receivers, and band-limited data, it is hard to obtain the desired high-resolution model with FWI. To address this challenge, we propose a new paradigm for FWI regularized by generative diffusion models. Specifically, we pre-train a diffusion model in a fully unsupervised manner on a prior velocity model distribution that represents our expectations of the subsurface and then adapt it to the seismic observations by incorporating the FWI into the sampling process of the generative diffusion models. What makes diffusion models uniquely appropriate for such an implementation is that the generative process retains the form and dimensions of the velocity model. Numerical examples demonstrate that our method can outperform the conventional FWI with only negligible additional computational cost. Even in cases of very sparse observations or observations with strong noise, the proposed method could still reconstruct a high-quality subsurface model. Thus, we can incorporate our prior expectations of the solutions in an efficient manner. We further test this approach on field data, which demonstrates the effectiveness of the proposed method.
LGApr 3, 2023
Model-Agnostic Reachability Analysis on Deep Neural NetworksChi Zhang, Wenjie Ruan, Fu Wang et al.
Verification plays an essential role in the formal analysis of safety-critical systems. Most current verification methods have specific requirements when working on Deep Neural Networks (DNNs). They either target one particular network category, e.g., Feedforward Neural Networks (FNNs), or networks with specific activation functions, e.g., RdLU. In this paper, we develop a model-agnostic verification framework, called DeepAgn, and show that it can be applied to FNNs, Recurrent Neural Networks (RNNs), or a mixture of both. Under the assumption of Lipschitz continuity, DeepAgn analyses the reachability of DNNs based on a novel optimisation scheme with a global convergence guarantee. It does not require access to the network's internal structures, such as layers and parameters. Through reachability analysis, DeepAgn can tackle several well-known robustness problems, including computing the maximum safe radius for a given input, and generating the ground-truth adversarial examples. We also empirically demonstrate DeepAgn's superior capability and efficiency in handling a broader class of deep neural networks, including both FNNs, and RNNs with very deep layers and millions of neurons, than other state-of-the-art verification approaches.
IRDec 7, 2025
WisPaper: Your AI Scholar Search EngineLi Ju, Jun Zhao, Mingxu Chai et al.
Researchers struggle to efficiently locate and manage relevant literature within the exponentially growing body of scientific publications. We present \textsc{WisPaper}, an intelligent academic retrieval and literature management platform that addresses this challenge through three integrated capabilities: (1) \textit{Scholar Search}, featuring both quick keyword-based and deep agentic search modes for efficient paper discovery; (2) \textit{Library}, a customizable knowledge base for systematic literature organization; and (3) \textit{AI Feeds}, an intelligent recommendation system that automatically delivers relevant new publications based on user interests. Unlike existing academic tools, \textsc{WisPaper} provides a closed-loop workflow that seamlessly connects literature discovery, management, and continuous tracking of research frontiers. Our multilingual and multidisciplinary system significantly reduces the time researchers from diverse backgrounds spend on paper screening and management, enabling them to focus on their core research activities. The platform is publicly accessible and serves researchers across academia and industry.
GEO-PHJul 25, 2024
Diffusion-based subsurface CO$_2$ multiphysics monitoring and forecastingXinquan Huang, Fu Wang, Tariq Alkhalifah
Carbon capture and storage (CCS) plays a crucial role in mitigating greenhouse gas emissions, particularly from industrial outputs. Using seismic monitoring can aid in an accurate and robust monitoring system to ensure the effectiveness of CCS and mitigate associated risks. However, conventional seismic wave equation-based approaches are computationally demanding, which hinders real-time applications. In addition to efficiency, forecasting and uncertainty analysis are not easy to handle using such numerical-simulation-based approaches. To this end, we propose a novel subsurface multiphysics monitoring and forecasting framework utilizing video diffusion models. This approach can generate high-quality representations of CO$2$ evolution and associated changes in subsurface elastic properties. With reconstruction guidance, forecasting and inversion can be achieved conditioned on historical frames and/or observational data. Meanwhile, due to the generative nature of the approach, we can quantify uncertainty in the prediction. Tests based on the Compass model show that the proposed method successfully captured the inherently complex physical phenomena associated with CO$_2$ monitoring, and it can predict and invert the subsurface elastic properties and CO$_2$ saturation with consistency in their evolution.
CVJan 4, 2021Code
Fooling Object Detectors: Adversarial Attacks by Half-Neighbor MasksYanghao Zhang, Fu Wang, Wenjie Ruan
Although there are a great number of adversarial attacks on deep learning based classifiers, how to attack object detection systems has been rarely studied. In this paper, we propose a Half-Neighbor Masked Projected Gradient Descent (HNM-PGD) based attack, which can generate strong perturbation to fool different kinds of detectors under strict constraints. We also applied the proposed HNM-PGD attack in the CIKM 2020 AnalytiCup Competition, which was ranked within the top 1% on the leaderboard. We release the code at https://github.com/YanghaoZYH/HNM-PGD.
CVOct 15, 2020Code
Generalizing Universal Adversarial Attacks Beyond Additive PerturbationsYanghao Zhang, Wenjie Ruan, Fu Wang et al.
The previous study has shown that universal adversarial attacks can fool deep neural networks over a large set of input images with a single human-invisible perturbation. However, current methods for universal adversarial attacks are based on additive perturbation, which cause misclassification when the perturbation is directly added to the input images. In this paper, for the first time, we show that a universal adversarial attack can also be achieved via non-additive perturbation (e.g., spatial transformation). More importantly, to unify both additive and non-additive perturbations, we propose a novel unified yet flexible framework for universal adversarial attacks, called GUAP, which is able to initiate attacks by additive perturbation, non-additive perturbation, or the combination of both. Extensive experiments are conducted on CIFAR-10 and ImageNet datasets with six deep neural network models including GoogleLeNet, VGG16/19, ResNet101/152, and DenseNet121. The empirical experiments demonstrate that GUAP can obtain up to 90.9% and 99.24% successful attack rates on CIFAR-10 and ImageNet datasets, leading to over 15% and 19% improvements respectively than current state-of-the-art universal adversarial attacks. The code for reproducing the experiments in this paper is available at https://github.com/TrustAI/GUAP.
LGMar 31
Hybrid Quantum-Classical Spatiotemporal Forecasting for 3D Cloud FieldsFu Wang, Qifeng Lu, Xinyu Long et al.
Accurate forecasting of three-dimensional (3D) cloud fields is important for atmospheric analysis and short-range numerical weather prediction, yet it remains challenging because cloud evolution involves cross-layer interactions, nonlocal dependencies, and multiscale spatiotemporal dynamics. Existing spatiotemporal prediction models based on convolutions, recurrence, or attention often rely on locality-biased representations and therefore struggle to preserve fine cloud structures in volumetric forecasting tasks. To address this issue, we propose QENO, a hybrid quantum-inspired spatiotemporal forecasting framework for 3D cloud fields. The proposed architecture consists of four components: a classical spatiotemporal encoder for compact latent representation, a topology-aware quantum enhancement block for modeling nonlocal couplings in latent space, a dynamic fusion temporal unit for integrating measurement-derived quantum features with recurrent memory, and a decoder for reconstructing future cloud volumes. Experiments on CMA-MESO 3D cloud fields show that QENO consistently outperforms representative baselines, including ConvLSTM, PredRNN++, Earthformer, TAU, and SimVP variants, in terms of MSE, MAE, RMSE, SSIM, and threshold-based detection metrics. In particular, QENO achieves an MSE of 0.2038, an RMSE of 0.4514, and an SSIM of 0.6291, while also maintaining a compact parameter budget. These results indicate that topology-aware hybrid quantum-classical feature modeling is a promising direction for 3D cloud structure forecasting and atmospheric Earth observation data analysis.
GEO-PHFeb 9, 2024
Controllable seismic velocity synthesis using generative diffusion modelsFu Wang, Xinquan Huang, Tariq Alkhalifah
Accurate seismic velocity estimations are vital to understanding Earth's subsurface structures, assessing natural resources, and evaluating seismic hazards. Machine learning-based inversion algorithms have shown promising performance in regional (i.e., for exploration) and global velocity estimation, while their effectiveness hinges on access to large and diverse training datasets whose distributions generally cover the target solutions. Additionally, enhancing the precision and reliability of velocity estimation also requires incorporating prior information, e.g., geological classes, well logs, and subsurface structures, but current statistical or neural network-based methods are not flexible enough to handle such multi-modal information. To address both challenges, we propose to use conditional generative diffusion models for seismic velocity synthesis, in which we readily incorporate those priors. This approach enables the generation of seismic velocities that closely match the expected target distribution, offering datasets informed by both expert knowledge and measured data to support training for data-driven geophysical methods. We demonstrate the flexibility and effectiveness of our method through training diffusion models on the OpenFWI dataset under various conditions, including class labels, well logs, reflectivity images, and the combination of these priors. The performance of the approach under out-of-distribution conditions further underscores its generalization ability, showcasing its potential to provide tailored priors for velocity inverse problems and create specific training datasets for machine learning-based geophysical applications.
GEO-PHSep 10, 2025
Physics-informed waveform inversion using pretrained wavefield neural operatorsXinquan Huang, Fu Wang, Tariq Alkhalifah
Full waveform inversion (FWI) is crucial for reconstructing high-resolution subsurface models, but it is often hindered, considering the limited data, by its null space resulting in low-resolution models, and more importantly, by its computational cost, especially if needed for real-time applications. Recent attempts to accelerate FWI using learned wavefield neural operators have shown promise in efficiency and differentiability, but typically suffer from noisy and unstable inversion performance. To address these limitations, we introduce a novel physics-informed FWI framework to enhance the inversion in accuracy while maintaining the efficiency of neural operator-based FWI. Instead of relying only on the L2 norm objective function via automatic differentiation, resulting in noisy model reconstruction, we integrate a physics constraint term in the loss function of FWI, improving the quality of the inverted velocity models. Specifically, starting with an initial model to simulate wavefields and then evaluating the loss over how much the resulting wavefield obeys the physical laws (wave equation) and matches the recorded data, we achieve a reduction in noise and artifacts. Numerical experiments using the OpenFWI and Overthrust models demonstrate our method's superior performance, offering cleaner and more accurate subsurface velocity than vanilla approaches. Considering the efficiency of the approach compared to FWI, this advancement represents a significant step forward in the practical application of FWI for real-time subsurface monitoring.
CVJul 13, 2025
Memory-Augmented SAM2 for Training-Free Surgical Video SegmentationMing Yin, Fu Wang, Xujiong Ye et al.
Surgical video segmentation is a critical task in computer-assisted surgery, essential for enhancing surgical quality and patient outcomes. Recently, the Segment Anything Model 2 (SAM2) framework has demonstrated remarkable advancements in both image and video segmentation. However, the inherent limitations of SAM2's greedy selection memory design are amplified by the unique properties of surgical videos-rapid instrument movement, frequent occlusion, and complex instrument-tissue interaction-resulting in diminished performance in the segmentation of complex, long videos. To address these challenges, we introduce Memory Augmented (MA)-SAM2, a training-free video object segmentation strategy, featuring novel context-aware and occlusion-resilient memory models. MA-SAM2 exhibits strong robustness against occlusions and interactions arising from complex instrument movements while maintaining accuracy in segmenting objects throughout videos. Employing a multi-target, single-loop, one-prompt inference further enhances the efficiency of the tracking process in multi-instrument videos. Without introducing any additional parameters or requiring further training, MA-SAM2 achieved performance improvements of 4.36% and 6.1% over SAM2 on the EndoVis2017 and EndoVis2018 datasets, respectively, demonstrating its potential for practical surgical applications.
CVSep 19, 2025
SGMAGNet: A Baseline Model for 3D Cloud Phase Structure Reconstruction on a New Passive Active Satellite BenchmarkChi Yang, Fu Wang, Xiaofei Yang et al.
Cloud phase profiles are critical for numerical weather prediction (NWP), as they directly affect radiative transfer and precipitation processes. In this study, we present a benchmark dataset and a baseline framework for transforming multimodal satellite observations into detailed 3D cloud phase structures, aiming toward operational cloud phase profile retrieval and future integration with NWP systems to improve cloud microphysics parameterization. The multimodal observations consist of (1) high--spatiotemporal--resolution, multi-band visible (VIS) and thermal infrared (TIR) imagery from geostationary satellites, and (2) accurate vertical cloud phase profiles from spaceborne lidar (CALIOP\slash CALIPSO) and radar (CPR\slash CloudSat). The dataset consists of synchronized image--profile pairs across diverse cloud regimes, defining a supervised learning task: given VIS/TIR patches, predict the corresponding 3D cloud phase structure. We adopt SGMAGNet as the main model and compare it with several baseline architectures, including UNet variants and SegNet, all designed to capture multi-scale spatial patterns. Model performance is evaluated using standard classification metrics, including Precision, Recall, F1-score, and IoU. The results demonstrate that SGMAGNet achieves superior performance in cloud phase reconstruction, particularly in complex multi-layer and boundary transition regions. Quantitatively, SGMAGNet attains a Precision of 0.922, Recall of 0.858, F1-score of 0.763, and an IoU of 0.617, significantly outperforming all baselines across these key metrics.
CVJun 22, 2025
OSDMamba: Enhancing Oil Spill Detection from Remote Sensing Images Using Selective State Space ModelShuaiyu Chen, Fu Wang, Peng Ren et al.
Semantic segmentation is commonly used for Oil Spill Detection (OSD) in remote sensing images. However, the limited availability of labelled oil spill samples and class imbalance present significant challenges that can reduce detection accuracy. Furthermore, most existing methods, which rely on convolutional neural networks (CNNs), struggle to detect small oil spill areas due to their limited receptive fields and inability to effectively capture global contextual information. This study explores the potential of State-Space Models (SSMs), particularly Mamba, to overcome these limitations, building on their recent success in vision applications. We propose OSDMamba, the first Mamba-based architecture specifically designed for oil spill detection. OSDMamba leverages Mamba's selective scanning mechanism to effectively expand the model's receptive field while preserving critical details. Moreover, we designed an asymmetric decoder incorporating ConvSSM and deep supervision to strengthen multi-scale feature fusion, thereby enhancing the model's sensitivity to minority class samples. Experimental results show that the proposed OSDMamba achieves state-of-the-art performance, yielding improvements of 8.9% and 11.8% in OSD across two publicly available datasets.
CVDec 18, 2024
A Black-Box Evaluation Framework for Semantic Robustness in Bird's Eye View DetectionFu Wang, Yanghao Zhang, Xiangyu Yin et al.
Camera-based Bird's Eye View (BEV) perception models receive increasing attention for their crucial role in autonomous driving, a domain where concerns about the robustness and reliability of deep learning have been raised. While only a few works have investigated the effects of randomly generated semantic perturbations, aka natural corruptions, on the multi-view BEV detection task, we develop a black-box robustness evaluation framework that adversarially optimises three common semantic perturbations: geometric transformation, colour shifting, and motion blur, to deceive BEV models, serving as the first approach in this emerging field. To address the challenge posed by optimising the semantic perturbation, we design a smoothed, distance-based surrogate function to replace the mAP metric and introduce SimpleDIRECT, a deterministic optimisation algorithm that utilises observed slopes to guide the optimisation process. By comparing with randomised perturbation and two optimisation baselines, we demonstrate the effectiveness of the proposed framework. Additionally, we provide a benchmark on the semantic robustness of ten recent BEV models. The results reveal that PolarFormer, which emphasises geometric information from multi-view images, exhibits the highest robustness, whereas BEVDet is fully compromised, with its precision reduced to zero.
GEO-PHDec 9, 2024
Geological and Well prior assisted full waveform inversion using conditional diffusion modelsFu Wang, Xinquan Huang, Tariq Alkhalifah
Full waveform inversion (FWI) often faces challenges due to inadequate seismic observations, resulting in band-limited and geologically inaccurate inversion results. Incorporating prior information from potential velocity distributions, well-log information, and our geological knowledge and expectations can significantly improve FWI convergence to a realistic model. While diffusion-regularized FWI has shown improved performance compared to conventional FWI by incorporating the velocity distribution prior, it can benefit even more by incorporating well-log information and other geological knowledge priors. To leverage this fact, we propose a geological class and well-information prior-assisted FWI using conditional diffusion models. This method seamlessly integrates multi-modal information into FWI, simultaneously achieving data fitting and universal geologic and geophysics prior matching, which is often not achieved with traditional regularization methods. Specifically, we propose to combine conditional diffusion models with FWI, where we integrate well-log data and geological class conditions into these conditional diffusion models using classifier-free guidance for multi-modal prior matching beyond the original velocity distribution prior. Numerical experiments on the OpenFWI datasets and field marine data demonstrate the effectiveness of our method compared to conventional FWI and the unconditional diffusion-regularized FWI.
LGMar 4, 2021
Dynamic Efficient Adversarial Training Guided by Gradient MagnitudeFu Wang, Yanghao Zhang, Yanbin Zheng et al.
Adversarial training is an effective but time-consuming way to train robust deep neural networks that can withstand strong adversarial attacks. As a response to its inefficiency, we propose Dynamic Efficient Adversarial Training (DEAT), which gradually increases the adversarial iteration during training. We demonstrate that the gradient's magnitude correlates with the curvature of the trained model's loss landscape, allowing it to reflect the effect of adversarial training. Therefore, based on the magnitude of the gradient, we propose a general acceleration strategy, M+ acceleration, which enables an automatic and highly effective method of adjusting the training procedure. M+ acceleration is computationally efficient and easy to implement. It is suited for DEAT and compatible with the majority of existing adversarial training techniques. Extensive experiments have been done on CIFAR-10 and ImageNet datasets with various training environments. The results show that the proposed M+ acceleration significantly improves the training efficiency of existing adversarial training methods while achieving similar robustness performance. This demonstrates that the strategy is highly adaptive and offers a valuable solution for automatic adversarial training.