NAMar 15, 2013
A Conservative Finite Difference Scheme for Poisson-Nernst-Planck EquationsAllen Flavell, Michael Machen, Bob Eisenberg et al.
A macroscopic model to describe the dynamics of ion transport in ion channels is the Poisson-Nernst-Planck(PNP) equations. In this paper, we develop a finite-difference method for solving PNP equations, which is second-order accurate in both space and time. We use the physical parameters specifically suited toward the modelling of ion channels. We present a simple iterative scheme to solve the system of nonlinear equations resulting from discretizing the equations implicitly in time, which is demonstrated to converge in a few iterations. We place emphasis on ensuring numerical methods to have the same physical properties that the PNP equations themselves also possess, namely conservation of total ions and correct rates of energy dissipation. We describe in detail an approach to derive a finite-difference method that preserves the total concentration of ions exactly in time. Further, we illustrate that, using realistic values of the physical parameters, the conservation property is critical in obtaining correct numerical solutions over long time scales.
IVOct 8, 2023
VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial IntelligenceJianing Qiu, Jian Wu, Hao Wei et al.
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation.
NAJul 2, 2018
Numerical methods for Porous Medium Equation by an Energetic Variational ApproachChenghua Duan, Chun Liu, Cheng Wang et al.
We study numerical methods for porous media equation (PME). There are two important characteristics: the finite speed propagation of the free boundary and the potential waiting time, which make the problem not easy to handle. Based on different dissipative energy laws, we develop two numerical schemes by an energetic variational approach. Firstly, based on $f \log f$ as the total energy form of the dissipative law, we obtain the trajectory equation, and then construct a fully discrete scheme. It is proved that the scheme is uniquely solvable on an admissible convex set by taking the advantage of the singularity of the total energy. Next, based on $\frac{1}{2 f}$ as the total energy form of the dissipation law, we construct a linear numerical scheme for the corresponding trajectory equation. Both schemes preserve the corresponding discrete dissipation law. Meanwhile, under some smoothness assumption, it is proved, by a higher order expansion technique, that both schemes are second-order convergent in space and first-order convergent in time. Each scheme yields a good approximation for the solution and the free boundary. No oscillation is observed for the numerical solution around the free boundary. Furthermore, the waiting time problem could be naturally treated, which has been a well-known difficult issue for all the existence methods. Due to its linear nature, the second scheme is more efficient.
NAMar 21, 2017
The Derivation and Approximation of Coarse-grained Dynamics from Langevin DynamicsLina Ma, Xiantao Li, Chun Liu
We present a derivation of a coarse-grained model from the Langevin dynamics. The focus is placed on the memory kernel function and the fluctuation-dissipation theorem. Also presented is an hierarchy of approximations for the memory and random noise terms, using rational approximations in the Laplace domain. These approximations offer increasing accuracy. More importantly, they eliminate the need to evaluate the integral associated with the memory term at each time step.
NAMar 26, 2018
Numerical Complete Solution for Random Genetic Drift by Energetic Variational ApproachChenghua Duan, Chun Liu, Cheng Wang et al.
In this paper, we focus on numerical solutions for random genetic drift problem, which is governed by a degenerated convection-dominated parabolic equation. Due to the fixation phenomenon of genes, Dirac delta singularities will develop at boundary points as time evolves. Based on an energetic variational approach (EnVarA), a balance between the maximal dissipation principle (MDP) and least action principle (LAP), we obtain the trajectory equation. In turn, a numerical scheme is proposed using a convex splitting technique, with the unique solvability (on a convex set) and the energy decay property (in time) justified at a theoretical level. Numerical examples are presented for cases of pure drift and drift with semi-selection. The remarkable advantage of this method is its ability to catch the Dirac delta singularity close to machine precision over any equidistant grid.
CVMar 6, 2023Code
Faster Learning of Temporal Action Proposal via Sparse Multilevel Boundary GeneratorQing Song, Yang Zhou, Mengjie Hu et al.
Temporal action localization in videos presents significant challenges in the field of computer vision. While the boundary-sensitive method has been widely adopted, its limitations include incomplete use of intermediate and global information, as well as an inefficient proposal feature generator. To address these challenges, we propose a novel framework, Sparse Multilevel Boundary Generator (SMBG), which enhances the boundary-sensitive method with boundary classification and action completeness regression. SMBG features a multi-level boundary module that enables faster processing by gathering boundary information at different lengths. Additionally, we introduce a sparse extraction confidence head that distinguishes information inside and outside the action, further optimizing the proposal feature generator. To improve the synergy between multiple branches and balance positive and negative samples, we propose a global guidance loss. Our method is evaluated on two popular benchmarks, ActivityNet-1.3 and THUMOS14, and is shown to achieve state-of-the-art performance, with a better inference speed (2.47xBSN++, 2.12xDBG). These results demonstrate that SMBG provides a more efficient and simple solution for generating temporal action proposals. Our proposed framework has the potential to advance the field of computer vision and enhance the accuracy and speed of temporal action localization in video analysis.The code and models are made available at \url{https://github.com/zhouyang-001/SMBG-for-temporal-action-proposal}.
NAMar 15, 2015
Energetically stable discretizations for charge carrier transport and electrokinetic modelsChun Liu, Maximilian Metti, Jinchao Xu
A finite element discretization using a method of lines approached is proposed for approximately solving the Poisson-Nernst-Planck (PNP) equations. This discretization scheme enforces positivity of the computed solutions, corresponding to particle density functions, and a discrete energy estimate is established that resembles the familiar energy law for the PNP system. This energy estimate is extended to finite element solutions to an electrokinetic model, which couples the PNP system with the Navier-Stokes equations. Numerical experiments are conducted to validate convergence of the computed solution and verify the discrete energy estimate.
NANov 25, 2018
Coarse-graining Langevin dynamics using reduced-order techniquesLina Ma, Xiantao Li, Chun Liu
This paper considers the reduction of the Langevin equation arising from bio-molecular models. To facilitate the construction and implementation of the reduced models, the problem is formulated as a reduced-order modeling problem. The reduced models can then be directly obtained from a Galerkin projection to appropriately defined Krylov subspaces. The equivalence to a moment-matching procedure, previously implemented in , 2), is proved. A particular emphasis is placed on the reduction of the stochastic noise, which is absent in many order-reduction problems. In particular, for order less than six we can show the reduced model obtained from the subspace projection automatically satisfies the fluctuation-dissipation theorem. Details for the implementations, including a bi-orthogonalization procedure and the minimization of the number of matrix multiplications, will be discussed as well.
NAMar 21, 2017
From Generalized Langevin Equations to Brownian Dynamics and Embedded Brownian DynamicsLina Ma, Xiantao Li, Chun Liu
We present the reduction of generalized Langevin equations to a coordinate-only stochastic model, which in its exact form, involves a forcing term with memory and a general Gaussian noise. It will be shown that a similar fluctuation-dissipation theorem still holds at this level. We study the approximation by the typical Brownian dynamics as a first approximation. Our numerical test indicates how the intrinsic frequency of the kernel function influences the accuracy of this approximation. In the case when such an approximate is inadequate, further approximations can be derived by embedding the nonlocal model into an extended dynamics without memory. By imposing noises in the auxiliary variables, we show how the second fluctuation-dissipation theorem is still exactly satisfied.
70.6LGMay 8Code
Efficient Verification of Neural Control Barrier Functions with Smooth Nonlinear ActivationsJun Zhang, Haibo Zhang, Chun Liu et al.
Formal verification of neural control barrier functions (NCBFs) remains challenging, especially for neural networks with nonlinear activations like \(\tanh\). Existing CROWN-based methods rely on conservative linear relaxations for Jacobian bounds, limiting scalability. We propose LightCROWN, which computes tighter Jacobian bounds by exploiting the analytical properties of activation functions. Experiments on nonlinear control systems including the inverted pendulum, Dubins car, and planar quadrotor demonstrate that LightCROWN improves verification success rates up to 100\%, while enhancing speed and scalability. Our approach provides a generalizable improvement for CROWN-based frameworks, enabling more efficient verification of complex NCBFs. The code can be found at github.com/Autonomous-Systems-and-Control-Lab/verify-neural-CBF.
NADec 12, 2016
Behavior of different numerical schemes for population genetic drift problemsMinxin Chen, Chun Liu, Shixin Xu et al.
In this paper, we focus on numerical methods for the genetic drift problems, which is governed by a degenerated convection-dominated parabolic equation. Due to the degeneration and convection, Dirac singularities will always be developed at boundary points as time evolves. In order to find a \emph{complete solution} which should keep the conservation of total probability and expectation, three different schemes based on finite volume methods are used to solve the equation numerically: one is a upwind scheme, the other two are different central schemes. We observed that all the methods are stable and can keep the total probability, but have totally different long-time behaviors concerning with the conservation of expectation. We prove that any extra infinitesimal diffusion leads to a same artificial steady state. So upwind scheme does not work due to its intrinsic numerical viscosity. We find one of the central schemes introduces a numerical viscosity term too, which is beyond the common understanding in the convection-diffusion community. Careful analysis is presented to prove that the other central scheme does work. Our study shows that the numerical methods should be carefully chosen and any method with intrinsic numerical viscosity must be avoided.
CVNov 2, 2023
Multi-level Relation Learning for Cross-domain Few-shot Hyperspectral Image ClassificationChun Liu, Longwei Yang, Zheng Li et al.
Cross-domain few-shot hyperspectral image classification focuses on learning prior knowledge from a large number of labeled samples from source domains and then transferring the knowledge to the tasks which contain few labeled samples in target domains. Following the metric-based manner, many current methods first extract the features of the query and support samples, and then directly predict the classes of query samples according to their distance to the support samples or prototypes. The relations between samples have not been fully explored and utilized. Different from current works, this paper proposes to learn sample relations on different levels and take them into the model learning process, to improve the cross-domain few-shot hyperspectral image classification. Building on current method of "Deep Cross-Domain Few-Shot Learning for Hyperspectral Image Classification" which adopts a domain discriminator to deal with domain-level distribution difference, the proposed method applies contrastive learning to learn the class-level sample relations to obtain more discriminable sample features. In addition, it adopts a transformer based cross-attention learning module to learn the set-level sample relations and acquire the attention from query samples to support samples. Our experimental results have demonstrated the contribution of the multi-level relation learning mechanism for few-shot hyperspectral image classification when compared with the state of the art methods.
CVMar 12, 2023
CoT-MISR:Marrying Convolution and Transformer for Multi-Image Super-ResolutionMingming Xiu, Yang Nie, Qing Song et al.
As a method of image restoration, image super-resolution has been extensively studied at first. How to transform a low-resolution image to restore its high-resolution image information is a problem that researchers have been exploring. In the early physical transformation methods, the high-resolution pictures generated by these methods always have a serious problem of missing information, and the edges and details can not be well recovered. With the development of hardware technology and mathematics, people begin to use in-depth learning methods for image super-resolution tasks, from direct in-depth learning models, residual channel attention networks, bi-directional suppression networks, to tr networks with transformer network modules, which have gradually achieved good results. In the research of multi-graph super-resolution, thanks to the establishment of multi-graph super-resolution dataset, we have experienced the evolution from convolution model to transformer model, and the quality of super-resolution has been continuously improved. However, we find that neither pure convolution nor pure tr network can make good use of low-resolution image information. Based on this, we propose a new end-to-end CoT-MISR network. CoT-MISR network makes up for local and global information by using the advantages of convolution and tr. The validation of dataset under equal parameters shows that our CoT-MISR network has reached the optimal score index.
CVNov 4, 2022
UV R-CNN: Stable and Efficient Dense Human Pose EstimationWenhe Jia, Yilin Zhou, Xuhan Zhu et al.
Dense pose estimation is a dense 3D prediction task for instance-level human analysis, aiming to map human pixels from an RGB image to a 3D surface of the human body. Due to a large amount of surface point regression, the training process appears to be easy to collapse compared to other region-based human instance analyzing tasks. By analyzing the loss formulation of the existing dense pose estimation model, we introduce a novel point regression loss function, named Dense Points} loss to stable the training progress, and a new balanced loss weighting strategy to handle the multi-task losses. With the above novelties, we propose a brand new architecture, named UV R-CNN. Without auxiliary supervision and external knowledge from other tasks, UV R-CNN can handle many complicated issues in dense pose model training progress, achieving 65.0% $AP_{gps}$ and 66.1% $AP_{gpsm}$ on the DensePose-COCO validation subset with ResNet-50-FPN feature extractor, competitive among the state-of-the-art dense human pose estimation methods.
CVAug 16, 2022
SGM-Net: Semantic Guided Matting NetQing Song, Wenfeng Sun, Donghan Yang et al.
Human matting refers to extracting human parts from natural images with high quality, including human detail information such as hair, glasses, hat, etc. This technology plays an essential role in image synthesis and visual effects in the film industry. When the green screen is not available, the existing human matting methods need the help of additional inputs (such as trimap, background image, etc.), or the model with high computational cost and complex network structure, which brings great difficulties to the application of human matting in practice. To alleviate such problems, most existing methods (such as MODNet) use multi-branches to pave the way for matting through segmentation, but these methods do not make full use of the image features and only utilize the prediction results of the network as guidance information. Therefore, we propose a module to generate foreground probability map and add it to MODNet to obtain Semantic Guided Matting Net (SGM-Net). Under the condition of only one image, we can realize the human matting task. We verify our method on the P3M-10k dataset. Compared with the benchmark, our method has significantly improved in various evaluation indicators.
LGNov 14, 2023
Iterative missing value imputation based on feature importanceCong Guo, Chun Liu, Wei Yang
Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods estimate the missing parts based on the observed values in the original feature space, and they treat all features as equally important during data completion, while in fact different features have different importance. Therefore, we have designed an imputation method that considers feature importance. This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance. Our experimental analysis involves three types of datasets: synthetic datasets with different noisy features and missing values, real-world datasets with artificially generated missing values, and real-world datasets originally containing missing values. The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.To the best of our knowledge, this is the first work that considers feature importance in the imputation model.
CVAug 25, 2025Code
Few-shot Unknown Class Discovery of Hyperspectral Images with Prototype Learning and ClusteringChun Liu, Chen Zhang, Zhuo Li et al.
Open-set few-shot hyperspectral image (HSI) classification aims to classify image pixels by using few labeled pixels per class, where the pixels to be classified may be not all from the classes that have been seen. To address the open-set HSI classification challenge, current methods focus mainly on distinguishing the unknown class samples from the known class samples and rejecting them to increase the accuracy of identifying known class samples. They fails to further identify or discovery the unknow classes among the samples. This paper proposes a prototype learning and clustering method for discoverying unknown classes in HSIs under the few-shot environment. Using few labeled samples, it strives to develop the ability of infering the prototypes of unknown classes while distinguishing unknown classes from known classes. Once the unknown class samples are rejected by the learned known class classifier, the proposed method can further cluster the unknown class samples into different classes according to their distance to the inferred unknown class prototypes. Compared to existing state-of-the-art methods, extensive experiments on four benchmark HSI datasets demonstrate that our proposed method exhibits competitive performance in open-set few-shot HSI classification tasks. All the codes are available at \href{https://github.com/KOBEN-ff/OpenFUCD-main} {https://github.com/KOBEN-ff/OpenFUCD-main}
CLJun 6, 2024Code
LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text ClassificationChun Liu, Hongguang Zhang, Kainan Zhao et al.
With the booming of Large Language Models (LLMs), prompt-learning has become a promising method mainly researched in various research areas. Recently, many attempts based on prompt-learning have been made to improve the performance of text classification. However, most of these methods are based on heuristic Chain-of-Thought (CoT), and tend to be more complex but less efficient. In this paper, we rethink the LLM-based text classification methodology, propose a simple and effective transfer learning strategy, namely LLMEmbed, to address this classical but challenging task. To illustrate, we first study how to properly extract and fuse the text embeddings via various lightweight LLMs at different network depths to improve their robustness and discrimination, then adapt such embeddings to train the classifier. We perform extensive experiments on publicly available datasets, and the results show that LLMEmbed achieves strong performance while enjoys low training overhead using lightweight LLM backbones compared to recent methods based on larger LLMs, i.e. GPT-3, and sophisticated prompt-based strategies. Our LLMEmbed achieves adequate accuracy on publicly available benchmarks without any fine-tuning while merely use 4% model parameters, 1.8% electricity consumption and 1.5% runtime compared to its counterparts. Code is available at: https://github.com/ChunLiu-cs/LLMEmbed-ACL2024.
CVMay 31, 2023Code
Multi-level Cross-modal Feature Alignment via Contrastive Learning towards Zero-shot Classification of Remote Sensing Image ScenesChun Liu, Suqiang Ma, Zheng Li et al.
Zero-shot classification of image scenes which can recognize the image scenes that are not seen in the training stage holds great promise of lowering the dependence on large numbers of labeled samples. To address the zero-shot image scene classification, the cross-modal feature alignment methods have been proposed in recent years. These methods mainly focus on matching the visual features of each image scene with their corresponding semantic descriptors in the latent space. Less attention has been paid to the contrastive relationships between different image scenes and different semantic descriptors. In light of the challenge of large intra-class difference and inter-class similarity among image scenes and the potential noisy samples, these methods are susceptible to the influence of the instances which are far from these of the same classes and close to these of other classes. In this work, we propose a multi-level cross-modal feature alignment method via contrastive learning for zero-shot classification of remote sensing image scenes. While promoting the single-instance level positive alignment between each image scene with their corresponding semantic descriptors, the proposed method takes the cross-instance contrastive relationships into consideration,and learns to keep the visual and semantic features of different classes in the latent space apart from each other. Extensive experiments have been done to evaluate the performance of the proposed method. The results show that our proposed method outperforms state of the art methods for zero-shot remote sensing image scene classification. All the code and data are available at github https://github.com/masuqiang/MCFA-Pytorch
CVSep 20, 2020Code
Renovating Parsing R-CNN for Accurate Multiple Human ParsingLu Yang, Qing Song, Zhihui Wang et al.
Multiple human parsing aims to segment various human parts and associate each part with the corresponding instance simultaneously. This is a very challenging task due to the diverse human appearance, semantic ambiguity of different body parts, and complex background. Through analysis of multiple human parsing task, we observe that human-centric global perception and accurate instance-level parsing scoring are crucial for obtaining high-quality results. But the most state-of-the-art methods have not paid enough attention to these issues. To reverse this phenomenon, we present Renovating Parsing R-CNN (RP R-CNN), which introduces a global semantic enhanced feature pyramid network and a parsing re-scoring network into the existing high-performance pipeline. The proposed RP R-CNN adopts global semantic representation to enhance multi-scale features for generating human parsing maps, and regresses a confidence score to represent its quality. Extensive experiments show that RP R-CNN performs favorably against state-of-the-art methods on CIHP and MHP-v2 datasets. Code and models are available at https://github.com/soeaver/RP-R-CNN.
53.3LGApr 22
Structure-Aware Variational Learning of a Class of Generalized DiffusionsYubin Lu, Xiaofan Li, Chun Liu et al.
Learning the underlying potential energy of stochastic gradient systems from partial and noisy observations is a fundamental problem arising in physics, chemistry, and data-driven modeling. Classical approaches often rely on direct regression of governing equations or velocity fields, which can be sensitive to noise and external perturbations and may fail when observations are incomplete. In this work, we propose a structure-aware, energy-based learning framework for inferring unknown potential functions in generalized diffusion processes, grounded in the energetic variational approach. Starting from the energy-dissipation law associated with the Fokker-Planck equation, we construct loss functions based on the De Giorgi dissipation functional, which consistently couple the free energy and the dissipation mechanism of the system. This formulation avoids explicit enforcement of the governing partial differential equation and preserves the underlying variational structure of the dynamics. Through numerical experiments in one, two, and three dimensions, we demonstrate that the proposed energy-based loss exhibits enhanced robustness with respect to observation time, noise level, and the diversity and amount of available training data. These results highlight the effectiveness of energy-dissipation principles as a reliable foundation for learning stochastic diffusion dynamics from data.
CVJan 22, 2024
Augmenting Prototype Network with TransMix for Few-shot Hyperspectral Image ClassificationChun Liu, Longwei Yang, Dongmei Dong et al.
Few-shot hyperspectral image classification aims to identify the classes of each pixel in the images by only marking few of these pixels. And in order to obtain the spatial-spectral joint features of each pixel, the fixed-size patches centering around each pixel are often used for classification. However, observing the classification results of existing methods, we found that boundary patches corresponding to the pixels which are located at the boundary of the objects in the hyperspectral images, are hard to classify. These boundary patchs are mixed with multi-class spectral information. Inspired by this, we propose to augment the prototype network with TransMix for few-shot hyperspectrial image classification(APNT). While taking the prototype network as the backbone, it adopts the transformer as feature extractor to learn the pixel-to-pixel relation and pay different attentions to different pixels. At the same time, instead of directly using the patches which are cut from the hyperspectral images for training, it randomly mixs up two patches to imitate the boundary patches and uses the synthetic patches to train the model, with the aim to enlarge the number of hard training samples and enhance their diversity. And by following the data agumentation technique TransMix, the attention returned by the transformer is also used to mix up the labels of two patches to generate better labels for synthetic patches. Compared with existing methods, the proposed method has demonstrated sate of the art performance and better robustness for few-shot hyperspectral image classification in our experiments.
SDFeb 2
When Noise Lowers The Loss: Rethinking Likelihood-Based Evaluation in Music Large Language ModelsXiaosha Li, Chun Liu, Ziyu Wang
The rise of music large language models (LLMs) demands robust methods of evaluating output quality, especially in distinguishing high-quality compositions from "garbage music". Curiously, we observe that the standard cross-entropy loss -- a core training metric -- often decrease when models encounter systematically corrupted music, undermining its validity as a standalone quality indicator. To investigate this paradox, we introduce noise injection experiment, where controlled noise signal of varying lengths are injected into musical contexts. We hypothesize that a model's loss reacting positively to these perturbations, specifically a sharp increase ("Peak" area) for short injection, can serve as a proxy for its ability to discern musical integrity. Experiments with MusicGen models in the audio waveform domain confirm that Music LLMs respond more strongly to local, texture-level disruptions than to global semantic corruption. Beyond exposing this bias, our results highlight a new principle: the shape of the loss curve -- rather than its absolute value -- encodes critical information about the quality of the generated content (i.e., model behavior). We envision this profile-based evaluation as a label-free, model-intrinsic framework for assessing musical quality -- opening the door to more principled training objectives and sharper benchmarks.
CVSep 9, 2025
Generating Transferrable Adversarial Examples via Local Mixing and Logits Optimization for Remote Sensing Object RecognitionChun Liu, Hailong Wang, Bingqian Zhu et al.
Deep Neural Networks (DNNs) are vulnerable to adversarial attacks, posing significant security threats to their deployment in remote sensing applications. Research on adversarial attacks not only reveals model vulnerabilities but also provides critical insights for enhancing robustness. Although current mixing-based strategies have been proposed to increase the transferability of adversarial examples, they either perform global blending or directly exchange a region in the images, which may destroy global semantic features and mislead the optimization of adversarial examples. Furthermore, their reliance on cross-entropy loss for perturbation optimization leads to gradient diminishing during iterative updates, compromising adversarial example quality. To address these limitations, we focus on non-targeted attacks and propose a novel framework via local mixing and logits optimization. First, we present a local mixing strategy to generate diverse yet semantically consistent inputs. Different from MixUp, which globally blends two images, and MixCut, which stitches images together, our method merely blends local regions to preserve global semantic information. Second, we adapt the logit loss from targeted attacks to non-targeted scenarios, mitigating the gradient vanishing problem of cross-entropy loss. Third, a perturbation smoothing loss is applied to suppress high-frequency noise and enhance transferability. Extensive experiments on FGSCR-42 and MTARSI datasets demonstrate superior performance over 12 state-of-the-art methods across 6 surrogate models. Notably, with ResNet as the surrogate on MTARSI, our method achieves a 17.28% average improvement in black-box attack success rate.
CVAug 29, 2025
Adversarial Patch Attack for Ship Detection via Localized AugmentationChun Liu, Panpan Ding, Zheng Zheng et al.
Current ship detection techniques based on remote sensing imagery primarily rely on the object detection capabilities of deep neural networks (DNNs). However, DNNs are vulnerable to adversarial patch attacks, which can lead to misclassification by the detection model or complete evasion of the targets. Numerous studies have demonstrated that data transformation-based methods can improve the transferability of adversarial examples. However, excessive augmentation of image backgrounds or irrelevant regions may introduce unnecessary interference, resulting in false detections of the object detection model. These errors are not caused by the adversarial patches themselves but rather by the over-augmentation of background and non-target areas. This paper proposes a localized augmentation method that applies augmentation only to the target regions, avoiding any influence on non-target areas. By reducing background interference, this approach enables the loss function to focus more directly on the impact of the adversarial patch on the detection model, thereby improving the attack success rate. Experiments conducted on the HRSC2016 dataset demonstrate that the proposed method effectively increases the success rate of adversarial patch attacks and enhances their transferability.
MLJun 26, 2025
Active Learning for Manifold Gaussian Process RegressionYuanxing Cheng, Lulu Kang, Yiwei Wang et al.
This paper introduces an active learning framework for manifold Gaussian Process (GP) regression, combining manifold learning with strategic data selection to improve accuracy in high-dimensional spaces. Our method jointly optimizes a neural network for dimensionality reduction and a Gaussian process regressor in the latent space, supervised by an active learning criterion that minimizes global prediction error. Experiments on synthetic data demonstrate superior performance over randomly sequential learning. The framework efficiently handles complex, discontinuous functions while preserving computational tractability, offering practical value for scientific and engineering applications. Future work will focus on scalability and uncertainty-aware manifold learning.
CVJun 12, 2025
Boosting Adversarial Transferability for Hyperspectral Image Classification Using 3D Structure-invariant Transformation and Weighted Intermediate Feature DivergenceChun Liu, Bingqian Zhu, Tao Xu et al.
Deep Neural Networks (DNNs) are vulnerable to adversarial attacks, which pose security challenges to hyperspectral image (HSI) classification based on DNNs. Numerous adversarial attack methods have been designed in the domain of natural images. However, different from natural images, HSIs contains high-dimensional rich spectral information, which presents new challenges for generating adversarial examples. Based on the specific characteristics of HSIs, this paper proposes a novel method to enhance the transferability of the adversarial examples for HSI classification using 3D structure-invariant transformation and weighted intermediate feature divergence. While keeping the HSIs structure invariant, the proposed method divides the image into blocks in both spatial and spectral dimensions. Then, various transformations are applied on each block to increase input diversity and mitigate the overfitting to substitute models. Moreover, a weighted intermediate feature divergence loss is also designed by leveraging the differences between the intermediate features of original and adversarial examples. It constrains the perturbation direction by enlarging the feature maps of the original examples, and assigns different weights to different feature channels to destroy the features that have a greater impact on HSI classification. Extensive experiments demonstrate that the adversarial examples generated by the proposed method achieve more effective adversarial transferability on three public HSI datasets. Furthermore, the method maintains robust attack performance even under defense strategies.
MLApr 4, 2025
Accelerating Particle-based Energetic Variational InferenceXuelian Bao, Lulu Kang, Chun Liu et al.
In this work, we propose a novel particle-based variational inference (ParVI) method that accelerates the EVI-Im. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, our approach efficiently drives particles towards the target distribution. Unlike EVI-Im, which employs the implicit Euler method to solve variational-preserving particle dynamics for minimizing the KL divergence, derived using a "discretize-then-variational" approach, the proposed algorithm avoids repeated evaluation of inter-particle interaction terms, significantly reducing computational cost. The framework is also extensible to other gradient-based sampling techniques. Through several numerical experiments, we demonstrate that our method outperforms existing ParVI approaches in efficiency, robustness, and accuracy.
CVMar 15, 2024
E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP GuidanceTianrui Huang, Pu Cao, Lu Yang et al.
Diffusion-based image editing is a composite process of preserving the source image content and generating new content or applying modifications. While current editing approaches have made improvements under text guidance, most of them have only focused on preserving the information of the input image, disregarding the importance of editability and alignment to the target prompt. In this paper, we prioritize the editability by proposing a zero-shot image editing method, named \textbf{E}nhance \textbf{E}ditability for text-based image \textbf{E}diting via \textbf{E}fficient \textbf{C}LIP guidance (\textbf{E4C}), which only requires inference-stage optimization to explicitly enhance the edibility and text alignment. Specifically, we develop a unified dual-branch feature-sharing pipeline that enables the preservation of the structure or texture of the source image while allowing the other to be adapted based on the editing task. We further integrate CLIP guidance into our pipeline by utilizing our novel random-gateway optimization mechanism to efficiently enhance the semantic alignment with the target prompt. Comprehensive quantitative and qualitative experiments demonstrate that our method effectively resolves the text alignment issues prevalent in existing methods while maintaining the fidelity to the source image, and performs well across a wide range of editing tasks.
MLNov 21, 2021
A Deterministic Sampling Method via Maximum Mean Discrepancy Flow with Adaptive KernelYindong Chen, Yiwei Wang, Lulu Kang et al.
We propose a novel deterministic sampling method to approximate a target distribution $ρ^*$ by minimizing the kernel discrepancy, also known as the Maximum Mean Discrepancy (MMD). By employing the general \emph{energetic variational inference} framework (Wang et al., 2021), we convert the problem of minimizing MMD to solving a dynamic ODE system of the particles. We adopt the implicit Euler numerical scheme to solve the ODE systems. This leads to a proximal minimization problem in each iteration of updating the particles, which can be solved by optimization algorithms such as L-BFGS. The proposed method is named EVI-MMD. To overcome the long-existing issue of bandwidth selection of the Gaussian kernel, we propose a novel way to specify the bandwidth dynamically. Through comprehensive numerical studies, we have shown the proposed adaptive bandwidth significantly improves the EVI-MMD. We use the EVI-MMD algorithm to solve two types of sampling problems. In the first type, the target distribution is given by a fully specified density function. The second type is a "two-sample problem", where only training data are available. The EVI-MMD method is used as a generative learning model to generate new samples that follow the same distribution as the training data. With the recommended settings of the tuning parameters, we show that the proposed EVI-MMD method outperforms some existing methods for both types of problems.
CRMay 12, 2021
An Efficient Matrix Multiplication with Enhanced Privacy Protection in Cloud Computing and Its ApplicationsChun Liu, Xuexian Hu, Xiaofeng Chen et al.
As one of the most important basic operations, matrix multiplication computation (MMC) has varieties of applications in the scientific and engineering community such as linear regression, k-nearest neighbor classification and biometric identification. However handling these tasks with large-scale datasets will lead to huge computation beyond resource-constrained client s computation power. With the rapid development of cloud computing, outsourcing intensive tasks to cloud server has become a promising method. While the cloud server is generally out of the control of clients, there are still many challenges concerned with the privacy security of clients sensitive data. Motivated by this, Lei et al. presented an efficient encryption scheme based on random permutation to protect the privacy of client s data in outsourcing MMC task. Nevertheless, there exists inherent security flaws in their scheme, revealing the statistic information of zero elements in the original data thus not satisfying the computational indistinguishability (IND-ZEA). Aiming to enhance the security of the outsourcing MMC task, we propose a new encryption scheme based on subtly designed invertible matrix where the additive perturbation is introduced besides the multiplicative perturbation. Furthermore, we show that the proposed encryption scheme can be applied to not only MMC task but also other kinds of outsourced tasks such as linear regression and principal component analysis. Theoretical analyses and experiments indicate that our methods are more secure in terms of data privacy, with comparable performance to the state-of-the-art scheme based on matrix transformation.
MLApr 14, 2020
Particle-based Energetic Variational InferenceYiwei Wang, Jiuhai Chen, Chun Liu et al.
We introduce a new variational inference (VI) framework, called energetic variational inference (EVI). It minimizes the VI objective function based on a prescribed energy-dissipation law. Using the EVI framework, we can derive many existing Particle-based Variational Inference (ParVI) methods, including the popular Stein Variational Gradient Descent (SVGD) approach. More importantly, many new ParVI schemes can be created under this framework. For illustration, we propose a new particle-based EVI scheme, which performs the particle-based approximation of the density first and then uses the approximated density in the variational procedure, or "Approximation-then-Variation" for short. Thanks to this order of approximation and variation, the new scheme can maintain the variational structure at the particle level, and can significantly decrease the KL-divergence in each iteration. Numerical experiments show the proposed method outperforms some existing ParVI methods in terms of fidelity to the target distribution.
CVMar 7, 2020
CPM R-CNN: Calibrating Point-guided Misalignment in Object DetectionBin Zhu, Qing Song, Lu Yang et al.
In object detection, offset-guided and point-guided regression dominate anchor-based and anchor-free method separately. Recently, point-guided approach is introduced to anchor-based method. However, we observe points predicted by this way are misaligned with matched region of proposals and score of localization, causing a notable gap in performance. In this paper, we propose CPM R-CNN which contains three efficient modules to optimize anchor-based point-guided method. According to sufficient evaluations on the COCO dataset, CPM R-CNN is demonstrated efficient to improve the localization accuracy by calibrating mentioned misalignment. Compared with Faster R-CNN and Grid R-CNN based on ResNet-101 with FPN, our approach can substantially improve detection mAP by 3.3% and 1.5% respectively without whistles and bells. Moreover, our best model achieves improvement by a large margin to 49.9% on COCO test-dev. Code and models will be publicly available.
CVJul 2, 2019
High-speed Railway Fastener Detection and Localization Method based on convolutional neural networkQing Song, Yao Guo, Jianan Jiang et al.
Railway transportation is the artery of China's national economy and plays an important role in the development of today's society. Due to the late start of China's railway security inspection technology, the current railway security inspection tasks mainly rely on manual inspection, but the manual inspection efficiency is low, and a lot of manpower and material resources are needed. In this paper, we establish a steel rail fastener detection image dataset, which contains 4,000 rail fastener pictures about 4 types. We use the regional suggestion network to generate the region of interest, extracts the features using the convolutional neural network, and fuses the classifier into the detection network. With online hard sample mining to improve the accuracy of the model, we optimize the Faster RCNN detection framework by reducing the number of regions of interest. Finally, the model accuracy reaches 99% and the speed reaches 35FPS in the deployment environment of TITAN X GPU.
CVMay 4, 2019
Leveraging Crowdsourced GPS Data for Road Extraction from Aerial ImageryTao Sun, Zonglin Di, Pengyu Che et al.
Deep learning is revolutionizing the mapping industry. Under lightweight human curation, computer has generated almost half of the roads in Thailand on OpenStreetMap (OSM) using high-resolution aerial imagery. Bing maps are displaying 125 million computer-generated building polygons in the U.S. While tremendously more efficient than manual mapping, one cannot map out everything from the air. Especially for roads, a small prediction gap by image occlusion renders the entire road useless for routing. Misconnections can be more dangerous. Therefore computer-based mapping often requires local verifications, which is still labor intensive. In this paper, we propose to leverage crowdsourced GPS data to improve and support road extraction from aerial imagery. Through novel data augmentation, GPS rendering, and 1D transpose convolution techniques, we show almost 5% improvements over previous competition winning models, and much better robustness when predicting new areas without any new training data or domain adaptation.
NEJun 2, 2016
On the performance of different mutation operators of a subpopulation-based genetic algorithm for multi-robot task allocation problemsChun Liu, Andreas Kroll
The performance of different mutation operators is usually evaluated in conjunc-tion with specific parameter settings of genetic algorithms and target problems. Most studies focus on the classical genetic algorithm with different parameters or on solving unconstrained combinatorial optimization problems such as the traveling salesman problems. In this paper, a subpopulation-based genetic al-gorithm that uses only mutation and selection is developed to solve multi-robot task allocation problems. The target problems are constrained combinatorial optimization problems, and are more complex if cooperative tasks are involved as these introduce additional spatial and temporal constraints. The proposed genetic algorithm can obtain better solutions than classical genetic algorithms with tournament selection and partially mapped crossover. The performance of different mutation operators in solving problems without/with cooperative tasks is evaluated. The results imply that inversion mutation performs better than others when solving problems without cooperative tasks, and the swap-inversion combination performs better than others when solving problems with cooperative tasks.