NEApr 8, 2023
Improving Performance Insensitivity of Large-scale Multiobjective Optimization via Monte Carlo Tree SearchHaokai Hong, Min Jiang, Gary G. Yen
The large-scale multiobjective optimization problem (LSMOP) is characterized by simultaneously optimizing multiple conflicting objectives and involving hundreds of decision variables. Many real-world applications in engineering fields can be modeled as LSMOPs; simultaneously, engineering applications require insensitivity in performance. This requirement usually means that the results from the algorithm runs should not only be good for every run in terms of performance but also that the performance of multiple runs should not fluctuate too much, i.e., the algorithm shows good insensitivity. Considering that substantial computational resources are requested for each run, it is essential to improve upon the performance of the large-scale multiobjective optimization algorithm, as well as the insensitivity of the algorithm. However, existing large-scale multiobjective optimization algorithms solely focus on improving the performance of the algorithms, leaving the insensitivity characteristics unattended. In this work, we propose an evolutionary algorithm for solving LSMOPs based on Monte Carlo tree search, the so-called LMMOCTS, which aims to improve the performance and insensitivity for large-scale multiobjective optimization problems. The proposed method samples the decision variables to construct new nodes on the Monte Carlo tree for optimization and evaluation. It selects nodes with good evaluation for further search to reduce the performance sensitivity caused by large-scale decision variables. We compare the proposed algorithm with several state-of-the-art designs on different benchmark functions. We also propose two metrics to measure the sensitivity of the algorithm. The experimental results confirm the effectiveness and performance insensitivity of the proposed design for solving large-scale multiobjective optimization problems.
QUANT-PHMay 24, 2019
A gradient algorithm for Hamiltonian identification of open quantum systemsShibei Xue, Rebing Wu, Dewei Li et al.
In this paper, we present a gradient algorithm for identifying unknown parameters in an open quantum system from the measurements of time traces of local observables. The open system dynamics is described by a general Markovian master equation based on which the Hamiltonian identification problem can be formulated as minimizing the distance between the real time traces of the observables and those predicted by the master equation. The unknown parameters can then be learned with a gradient descent algorithm from the measurement data. We verify the effectiveness of our algorithm in a circuit QED system described by a Jaynes-Cumming model whose Hamiltonian identification has been rarely considered. We also show that our gradient algorithm can learn the spectrum of a non-Markovian environment based on an augmented system model.
LGNov 11, 2025Code
PEGNet: A Physics-Embedded Graph Network for Long-Term Stable Multiphysics SimulationCan Yang, Zhenzhong Wang, Junyuan Liu et al.
Accurate and efficient simulations of physical phenomena governed by partial differential equations (PDEs) are important for scientific and engineering progress. While traditional numerical solvers are powerful, they are often computationally expensive. Recently, data-driven methods have emerged as alternatives, but they frequently suffer from error accumulation and limited physical consistency, especially in multiphysics and complex geometries. To address these challenges, we propose PEGNet, a Physics-Embedded Graph Network that incorporates PDE-guided message passing to redesign the graph neural network architecture. By embedding key PDE dynamics like convection, viscosity, and diffusion into distinct message functions, the model naturally integrates physical constraints into its forward propagation, producing more stable and physically consistent solutions. Additionally, a hierarchical architecture is employed to capture multi-scale features, and physical regularization is integrated into the loss function to further enforce adherence to governing physics. We evaluated PEGNet on benchmarks, including custom datasets for respiratory airflow and drug delivery, showing significant improvements in long-term prediction accuracy and physical consistency over existing methods. Our code is available at https://github.com/Yanghuoshan/PEGNet.
NEApr 8, 2023
Efficiently Tackling Million-Dimensional Multiobjective Problems: A Direction Sampling and Fine-Tuning ApproachHaokai Hong, Min Jiang, Qiuzhen Lin et al.
We define very large-scale multiobjective optimization problems as optimizing multiple objectives (VLSMOPs) with more than 100,000 decision variables. These problems hold substantial significance, given the ubiquity of real-world scenarios necessitating the optimization of hundreds of thousands, if not millions, of variables. However, the larger dimension in VLSMOPs intensifies the curse of dimensionality and poses significant challenges for existing large-scale evolutionary multiobjective algorithms, rendering them more difficult to solve within the constraints of practical computing resources. To overcome this issue, we propose a novel approach called the very large-scale multiobjective optimization framework (VMOF). The method efficiently samples general yet suitable evolutionary directions in the very large-scale space and subsequently fine-tunes these directions to locate the Pareto-optimal solutions. To sample the most suitable evolutionary directions for different solutions, Thompson sampling is adopted for its effectiveness in recommending from a very large number of items within limited historical evaluations. Furthermore, a technique is designed for fine-tuning directions specific to tracking Pareto-optimal solutions. To understand the designed framework, we present our analysis of the framework and then evaluate VMOF using widely recognized benchmarks and real-world problems spanning dimensions from 100 to 1,000,000. Experimental results demonstrate that our method exhibits superior performance not only on LSMOPs but also on VLSMOPs when compared to existing algorithms.
LGSep 4, 2024
Adversarial Learning for Neural PDE Solvers with Sparse DataYunpeng Gong, Yongjie Hou, Zhenzhong Wang et al.
Neural network solvers for partial differential equations (PDEs) have made significant progress, yet they continue to face challenges related to data scarcity and model robustness. Traditional data augmentation methods, which leverage symmetry or invariance, impose strong assumptions on physical systems that often do not hold in dynamic and complex real-world applications. To address this research gap, this study introduces a universal learning strategy for neural network PDEs, named Systematic Model Augmentation for Robust Training (SMART). By focusing on challenging and improving the model's weaknesses, SMART reduces generalization error during training under data-scarce conditions, leading to significant improvements in prediction accuracy across various PDE scenarios. The effectiveness of the proposed method is demonstrated through both theoretical analysis and extensive experimentation. The code will be available.
CVSep 11, 2024
Phy124: Fast Physics-Driven 4D Content Generation from a Single ImageJiajing Lin, Zhenzhong Wang, Yongjie Hou et al.
4D content generation focuses on creating dynamic 3D objects that change over time. Existing methods primarily rely on pre-trained video diffusion models, utilizing sampling processes or reference videos. However, these approaches face significant challenges. Firstly, the generated 4D content often fails to adhere to real-world physics since video diffusion models do not incorporate physical priors. Secondly, the extensive sampling process and the large number of parameters in diffusion models result in exceedingly time-consuming generation processes. To address these issues, we introduce Phy124, a novel, fast, and physics-driven method for controllable 4D content generation from a single image. Phy124 integrates physical simulation directly into the 4D generation process, ensuring that the resulting 4D content adheres to natural physical laws. Phy124 also eliminates the use of diffusion models during the 4D dynamics generation phase, significantly speeding up the process. Phy124 allows for the control of 4D dynamics, including movement speed and direction, by manipulating external forces. Extensive experiments demonstrate that Phy124 generates high-fidelity 4D content with significantly reduced inference times, achieving stateof-the-art performance. The code and generated 4D content are available at the provided link: https://anonymous.4open.science/r/BBF2/.
CVSep 26, 2024
Cross-Modality Attack Boosted by Gradient-Evolutionary Multiform OptimizationYunpeng Gong, Qingyuan Zeng, Dejun Xu et al.
In recent years, despite significant advancements in adversarial attack research, the security challenges in cross-modal scenarios, such as the transferability of adversarial attacks between infrared, thermal, and RGB images, have been overlooked. These heterogeneous image modalities collected by different hardware devices are widely prevalent in practical applications, and the substantial differences between modalities pose significant challenges to attack transferability. In this work, we explore a novel cross-modal adversarial attack strategy, termed multiform attack. We propose a dual-layer optimization framework based on gradient-evolution, facilitating efficient perturbation transfer between modalities. In the first layer of optimization, the framework utilizes image gradients to learn universal perturbations within each modality and employs evolutionary algorithms to search for shared perturbations with transferability across different modalities through secondary optimization. Through extensive testing on multiple heterogeneous datasets, we demonstrate the superiority and robustness of Multiform Attack compared to existing techniques. This work not only enhances the transferability of cross-modal adversarial attacks but also provides a new perspective for understanding security vulnerabilities in cross-modal systems.
CVJul 18, 2024
Beyond Augmentation: Empowering Model Robustness under Extreme Capture EnvironmentsYunpeng Gong, Yongjie Hou, Chuangliang Zhang et al.
Person Re-identification (re-ID) in computer vision aims to recognize and track individuals across different cameras. While previous research has mainly focused on challenges like pose variations and lighting changes, the impact of extreme capture conditions is often not adequately addressed. These extreme conditions, including varied lighting, camera styles, angles, and image distortions, can significantly affect data distribution and re-ID accuracy. Current research typically improves model generalization under normal shooting conditions through data augmentation techniques such as adjusting brightness and contrast. However, these methods pay less attention to the robustness of models under extreme shooting conditions. To tackle this, we propose a multi-mode synchronization learning (MMSL) strategy . This approach involves dividing images into grids, randomly selecting grid blocks, and applying data augmentation methods like contrast and brightness adjustments. This process introduces diverse transformations without altering the original image structure, helping the model adapt to extreme variations. This method improves the model's generalization under extreme conditions and enables learning diverse features, thus better addressing the challenges in re-ID. Extensive experiments on a simulated test set under extreme conditions have demonstrated the effectiveness of our method. This approach is crucial for enhancing model robustness and adaptability in real-world scenarios, supporting the future development of person re-identification technology.
CVJul 18, 2024
Beyond Dropout: Robust Convolutional Neural Networks Based on Local Feature MaskingYunpeng Gong, Chuangliang Zhang, Yongjie Hou et al.
In the contemporary of deep learning, where models often grapple with the challenge of simultaneously achieving robustness against adversarial attacks and strong generalization capabilities, this study introduces an innovative Local Feature Masking (LFM) strategy aimed at fortifying the performance of Convolutional Neural Networks (CNNs) on both fronts. During the training phase, we strategically incorporate random feature masking in the shallow layers of CNNs, effectively alleviating overfitting issues, thereby enhancing the model's generalization ability and bolstering its resilience to adversarial attacks. LFM compels the network to adapt by leveraging remaining features to compensate for the absence of certain semantic features, nurturing a more elastic feature learning mechanism. The efficacy of LFM is substantiated through a series of quantitative and qualitative assessments, collectively showcasing a consistent and significant improvement in CNN's generalization ability and resistance against adversarial attacks--a phenomenon not observed in current and prior methodologies. The seamless integration of LFM into established CNN frameworks underscores its potential to advance both generalization and adversarial robustness within the deep learning paradigm. Through comprehensive experiments, including robust person re-identification baseline generalization experiments and adversarial attack experiments, we demonstrate the substantial enhancements offered by LFM in addressing the aforementioned challenges. This contribution represents a noteworthy stride in advancing robust neural network architectures.
CVJul 18, 2024
Cross-Task Attack: A Self-Supervision Generative Framework Based on Attention ShiftQingyuan Zeng, Yunpeng Gong, Min Jiang
Studying adversarial attacks on artificial intelligence (AI) systems helps discover model shortcomings, enabling the construction of a more robust system. Most existing adversarial attack methods only concentrate on single-task single-model or single-task cross-model scenarios, overlooking the multi-task characteristic of artificial intelligence systems. As a result, most of the existing attacks do not pose a practical threat to a comprehensive and collaborative AI system. However, implementing cross-task attacks is highly demanding and challenging due to the difficulty in obtaining the real labels of different tasks for the same picture and harmonizing the loss functions across different tasks. To address this issue, we propose a self-supervised Cross-Task Attack framework (CTA), which utilizes co-attention and anti-attention maps to generate cross-task adversarial perturbation. Specifically, the co-attention map reflects the area to which different visual task models pay attention, while the anti-attention map reflects the area that different visual task models neglect. CTA generates cross-task perturbations by shifting the attention area of samples away from the co-attention map and closer to the anti-attention map. We conduct extensive experiments on multiple vision tasks and the experimental results confirm the effectiveness of the proposed design for adversarial attacks.
AIAug 16, 2024
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text ModelsQingyuan Zeng, Zhenzhong Wang, Yiu-ming Cheung et al.
While image-to-text models have demonstrated significant advancements in various vision-language tasks, they remain susceptible to adversarial attacks. Existing white-box attacks on image-to-text models require access to the architecture, gradients, and parameters of the target model, resulting in low practicality. Although the recently proposed gray-box attacks have improved practicality, they suffer from semantic loss during the training process, which limits their targeted attack performance. To advance adversarial attacks of image-to-text models, this paper focuses on a challenging scenario: decision-based black-box targeted attacks where the attackers only have access to the final output text and aim to perform targeted attacks. Specifically, we formulate the decision-based black-box targeted attack as a large-scale optimization problem. To efficiently solve the optimization problem, a three-stage process \textit{Ask, Attend, Attack}, called \textit{AAA}, is proposed to coordinate with the solver. \textit{Ask} guides attackers to create target texts that satisfy the specific semantics. \textit{Attend} identifies the crucial regions of the image for attacking, thus reducing the search space for the subsequent \textit{Attack}. \textit{Attack} uses an evolutionary algorithm to attack the crucial regions, where the attacks are semantically related to the target texts of \textit{Ask}, thus achieving targeted attacks without semantic loss. Experimental results on transformer-based and CNN+RNN-based image-to-text models confirmed the effectiveness of our proposed \textit{AAA}.
NEMar 23
Training-Free Diffusion-Driven Modeling of Pareto Set Evolution for Dynamic Multiobjective OptimizationJian Guan, Huolong Wu, Zhenzhong Wang et al.
Dynamic multiobjective optimization problems (DMOPs) feature time-varying objectives, which cause the Pareto optimal solution (POS) set to drift over time and make it difficult to maintain both convergence and diversity under limited response time. Many existing prediction-based dynamic multiobjective evolutionary algorithms (DMOEAs) either depend on learned models with nontrivial training cost or employ one-step population mapping, which may overlook the gradual nature of POS evolution. This paper proposes DD-DMOEA, a training-free diffusion-based dynamic response mechanism for DMOPs. The key idea is to treat the POS obtained in the previous environment as a "noisy" sample set and to guide its evolution toward the current POS through an analytically constructed multi-step denoising process. A knee-point-based auxiliary strategy is used to specify the target region in the new environment, and an explicit probability-density formulation is derived to compute the denoising update without neural training. To reduce the risk of misleading guidance caused by knee-point prediction errors, an uncertainty-aware scheme adaptively adjusts the guidance strength according to the historical prediction deviation. Experiments on the CEC2018 dynamic multiobjective benchmarks show that DD-DMOEA achieves competitive or better convergence-diversity performance and provides faster dynamic response than several state-of-the-art DMOEAs.
AIFeb 12
Neuro-Symbolic Multitasking: A Unified Framework for Discovering Generalizable Solutions to PDE FamiliesYipeng Huang, Dejun Xu, Zexin Lin et al.
Solving Partial Differential Equations (PDEs) is fundamental to numerous scientific and engineering disciplines. A common challenge arises from solving the PDE families, which are characterized by sharing an identical mathematical structure but varying in specific parameters. Traditional numerical methods, such as the finite element method, need to independently solve each instance within a PDE family, which incurs massive computational cost. On the other hand, while recent advancements in machine learning PDE solvers offer impressive computational speed and accuracy, their inherent ``black-box" nature presents a considerable limitation. These methods primarily yield numerical approximations, thereby lacking the crucial interpretability provided by analytical expressions, which are essential for deeper scientific insight. To address these limitations, we propose a neuro-assisted multitasking symbolic PDE solver framework for PDE family solving, dubbed NMIPS. In particular, we employ multifactorial optimization to simultaneously discover the analytical solutions of PDEs. To enhance computational efficiency, we devise an affine transfer method by transferring learned mathematical structures among PDEs in a family, avoiding solving each PDE from scratch. Experimental results across multiple cases demonstrate promising improvements over existing baselines, achieving up to a $\sim$35.7% increase in accuracy while providing interpretable analytical solutions.
FLU-DYNNov 9, 2025
Cross-Field Interface-Aware Neural Operators for Multiphase Flow SimulationZhenZhong Wang, Xin Zhang, Jun Liao et al.
Multiphase flow systems, with their complex dynamics, field discontinuities, and interphase interactions, pose significant computational challenges for traditional numerical solvers. While neural operators offer efficient alternatives, they often struggle to achieve high-resolution numerical accuracy in these systems. This limitation primarily stems from the inherent spatial heterogeneity and the scarcity of high-quality training data in multiphase flows. In this work, we propose the Interface Information-Aware Neural Operator (IANO), a novel framework that explicitly leverages interface information as a physical prior to enhance the prediction accuracy. The IANO architecture introduces two key components: 1) An interface-aware multiple function encoding mechanism jointly models multiple physical fields and interfaces, thus capturing the high-frequency physical features at the interface. 2) A geometry-aware positional encoding mechanism further establishes the relationship between interface information, physical variables, and spatial positions, enabling it to achieve pointwise super-resolution prediction even in the low-data regimes. Experimental results demonstrate that IANO outperforms baselines by $\sim$10\% in accuracy for multiphase flow simulations while maintaining robustness under data-scarce and noise-perturbed conditions.
CVJan 19, 2024Code
Exploring Color Invariance through Image-Level Ensemble LearningYunpeng Gong, Jiaquan Li, Lifei Chen et al.
In the field of computer vision, the persistent presence of color bias, resulting from fluctuations in real-world lighting and camera conditions, presents a substantial challenge to the robustness of models. This issue is particularly pronounced in complex wide-area surveillance scenarios, such as person re-identification and industrial dust segmentation, where models often experience a decline in performance due to overfitting on color information during training, given the presence of environmental variations. Consequently, there is a need to effectively adapt models to cope with the complexities of camera conditions. To address this challenge, this study introduces a learning strategy named Random Color Erasing, which draws inspiration from ensemble learning. This strategy selectively erases partial or complete color information in the training data without disrupting the original image structure, thereby achieving a balanced weighting of color features and other features within the neural network. This approach mitigates the risk of overfitting and enhances the model's ability to handle color variation, thereby improving its overall robustness. The approach we propose serves as an ensemble learning strategy, characterized by robust interpretability. A comprehensive analysis of this methodology is presented in this paper. Across various tasks such as person re-identification and semantic segmentation, our approach consistently improves strong baseline methods. Notably, in comparison to existing methods that prioritize color robustness, our strategy significantly enhances performance in cross-domain scenarios. The code available at \url{https://github.com/layumi/Person\_reID\_baseline\_pytorch/blob/master/random\_erasing.py} or \url{https://github.com/finger-monkey/Data-Augmentation}.
CVJan 18, 2024Code
Cross-Modality Perturbation Synergy Attack for Person Re-identificationYunpeng Gong, Zhun Zhong, Yansong Qu et al.
In recent years, there has been significant research focusing on addressing security concerns in single-modal person re-identification (ReID) systems that are based on RGB images. However, the safety of cross-modality scenarios, which are more commonly encountered in practical applications involving images captured by infrared cameras, has not received adequate attention. The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities. For instance, infrared images are typically grayscale, unlike visible images that contain color information. Existing attack methods have primarily focused on the characteristics of the visible image modality, overlooking the features of other modalities and the variations in data distribution among different modalities. This oversight can potentially undermine the effectiveness of these methods in image retrieval across diverse modalities. This study represents the first exploration into the security of cross-modality ReID models and proposes a universal perturbation attack specifically designed for cross-modality ReID. This attack optimizes perturbations by leveraging gradients from diverse modality data, thereby disrupting the discriminator and reinforcing the differences between modalities. We conducted experiments on three widely used cross-modality datasets, namely RegDB, SYSU, and LLCM. The results not only demonstrate the effectiveness of our method but also provide insights for future improvements in the robustness of cross-modality ReID systems. The code will be available at https://github.com/finger-monkey/cmps__attack.
CVNov 21, 2024
NexusSplats: Efficient 3D Gaussian Splatting in the WildYuzhou Tang, Dejun Xu, Yongjie Hou et al.
Photorealistic 3D reconstruction of unstructured real-world scenes remains challenging due to complex illumination variations and transient occlusions. Existing methods based on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) struggle with inefficient light decoupling and structure-agnostic occlusion handling. To address these limitations, we propose NexusSplats, an approach tailored for efficient and high-fidelity 3D scene reconstruction under complex lighting and occlusion conditions. In particular, NexusSplats leverages a hierarchical light decoupling strategy that performs centralized appearance learning, efficiently and effectively decoupling varying lighting conditions. Furthermore, a structure-aware occlusion handling mechanism is developed, establishing a nexus between 3D and 2D structures for fine-grained occlusion handling. Experimental results demonstrate that NexusSplats achieves state-of-the-art rendering quality and reduces the number of total parameters by 65.4\%, leading to 2.7$\times$ faster reconstruction.
LGApr 19, 2024
Multi-View Subgraph Neural Networks: Self-Supervised Learning with Scarce Labeled DataZhenzhong Wang, Qingyuan Zeng, Wanyu Lin et al.
While graph neural networks (GNNs) have become the de-facto standard for graph-based node classification, they impose a strong assumption on the availability of sufficient labeled samples. This assumption restricts the classification performance of prevailing GNNs on many real-world applications suffering from low-data regimes. Specifically, features extracted from scarce labeled nodes could not provide sufficient supervision for the unlabeled samples, leading to severe over-fitting. In this work, we point out that leveraging subgraphs to capture long-range dependencies can augment the representation of a node with homophily properties, thus alleviating the low-data regime. However, prior works leveraging subgraphs fail to capture the long-range dependencies among nodes. To this end, we present a novel self-supervised learning framework, called multi-view subgraph neural networks (Muse), for handling long-range dependencies. In particular, we propose an information theory-based identification mechanism to identify two types of subgraphs from the views of input space and latent space, respectively. The former is to capture the local structure of the graph, while the latter captures the long-range dependencies among nodes. By fusing these two views of subgraphs, the learned representations can preserve the topological properties of the graph at large, including the local structure and long-range dependencies, thus maximizing their expressiveness for downstream node classification tasks. Experimental results show that Muse outperforms the alternative methods on node classification tasks with limited labeled data.
LGOct 8, 2025
StruSR: Structure-Aware Symbolic Regression with Physics-Informed Taylor GuidanceYunpeng Gong, Sihan Lan, Can Yang et al.
Symbolic regression aims to find interpretable analytical expressions by searching over mathematical formula spaces to capture underlying system behavior, particularly in scientific modeling governed by physical laws. However, traditional methods lack mechanisms for extracting structured physical priors from time series observations, making it difficult to capture symbolic expressions that reflect the system's global behavior. In this work, we propose a structure-aware symbolic regression framework, called StruSR, that leverages trained Physics-Informed Neural Networks (PINNs) to extract locally structured physical priors from time series data. By performing local Taylor expansions on the outputs of the trained PINN, we obtain derivative-based structural information to guide symbolic expression evolution. To assess the importance of expression components, we introduce a masking-based attribution mechanism that quantifies each subtree's contribution to structural alignment and physical residual reduction. These sensitivity scores steer mutation and crossover operations within genetic programming, preserving substructures with high physical or structural significance while selectively modifying less informative components. A hybrid fitness function jointly minimizes physics residuals and Taylor coefficient mismatch, ensuring consistency with both the governing equations and the local analytical behavior encoded by the PINN. Experiments on benchmark PDE systems demonstrate that StruSR improves convergence speed, structural fidelity, and expression interpretability compared to conventional baselines, offering a principled paradigm for physics-grounded symbolic discovery.
CLMar 21, 2025
Automating Adjudication of Cardiovascular Events Using Large Language ModelsSonish Sivarajkumar, Kimia Ameri, Chuqin Li et al.
Cardiovascular events, such as heart attacks and strokes, remain a leading cause of mortality globally, necessitating meticulous monitoring and adjudication in clinical trials. This process, traditionally performed manually by clinical experts, is time-consuming, resource-intensive, and prone to inter-reviewer variability, potentially introducing bias and hindering trial progress. This study addresses these critical limitations by presenting a novel framework for automating the adjudication of cardiovascular events in clinical trials using Large Language Models (LLMs). We developed a two-stage approach: first, employing an LLM-based pipeline for event information extraction from unstructured clinical data and second, using an LLM-based adjudication process guided by a Tree of Thoughts approach and clinical endpoint committee (CEC) guidelines. Using cardiovascular event-specific clinical trial data, the framework achieved an F1-score of 0.82 for event extraction and an accuracy of 0.68 for adjudication. Furthermore, we introduce the CLEART score, a novel, automated metric specifically designed for evaluating the quality of AI-generated clinical reasoning in adjudicating cardiovascular events. This approach demonstrates significant potential for substantially reducing adjudication time and costs while maintaining high-quality, consistent, and auditable outcomes in clinical trials. The reduced variability and enhanced standardization also allow for faster identification and mitigation of risks associated with cardiovascular therapies.
CVNov 25, 2024
Phys4DGen: Physics-Compliant 4D Generation with Multi-Material Composition PerceptionJiajing Lin, Zhenzhong Wang, Dejun Xu et al.
4D content generation aims to create dynamically evolving 3D content that responds to specific input objects such as images or 3D representations. Current approaches typically incorporate physical priors to animate 3D representations, but these methods suffer from significant limitations: they not only require users lacking physics expertise to manually specify material properties but also struggle to effectively handle the generation of multi-material composite objects. To address these challenges, we propose Phys4DGen, a novel 4D generation framework that integrates multi-material composition perception with physical simulation. The framework achieves automated, physically plausible 4D generation through three innovative modules: first, the 3D Material Grouping module partitions heterogeneous material regions on 3D representations' surfaces via semantic segmentation; second, the Internal Physical Structure Discovery module constructs the mechanical structure of object interiors; finally, we distill physical prior knowledge from multimodal large language models to enable rapid and automatic material properties identification for both objects' surfaces and interiors. Experiments on both synthetic and real-world datasets demonstrate that Phys4DGen can generate high-fidelity 4D content with physical realism in open-world scenarios, significantly outperforming state-of-the-art methods.
CVNov 24, 2025
A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-IdentificationYunpeng Gong, Yongjie Hou, Jiangming Shi et al.
Sketch based person re-identification aims to match hand-drawn sketches with RGB surveillance images, but remains challenging due to significant modality gaps and limited annotated data. To address this, we introduce KTCAA, a theoretically grounded framework for few-shot cross-modal generalization. Motivated by generalization theory, we identify two key factors influencing target domain risk: (1) domain discrepancy, which quantifies the alignment difficulty between source and target distributions; and (2) perturbation invariance, which evaluates the model's robustness to modality shifts. Based on these insights, we propose two components: (1) Alignment Augmentation (AA), which applies localized sketch-style transformations to simulate target distributions and facilitate progressive alignment; and (2) Knowledge Transfer Catalyst (KTC), which enhances invariance by introducing worst-case perturbations and enforcing consistency. These modules are jointly optimized under a meta-learning paradigm that transfers alignment knowledge from data-rich RGB domains to sketch-based scenarios. Experiments on multiple benchmarks demonstrate that KTCAA achieves state-of-the-art performance, particularly in data-scarce conditions.
CVAug 19, 2025
VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel OptimizationJiajing Lin, Shu Jiang, Qingyuan Zeng et al.
The intrinsic dynamics of an object governs its physical behavior in the real world, playing a critical role in enabling physically plausible interactive simulation with 3D assets. Existing methods have attempted to infer the intrinsic dynamics of objects from visual observations, but generally face two major challenges: one line of work relies on manually defined constitutive priors, making it difficult to generalize to complex scenarios; the other models intrinsic dynamics using neural networks, resulting in limited interpretability and poor generalization. To address these challenges, we propose VisionLaw, a bilevel optimization framework that infers interpretable expressions of intrinsic dynamics from visual observations. At the upper level, we introduce an LLMs-driven decoupled constitutive evolution strategy, where LLMs are prompted as a knowledgeable physics expert to generate and revise constitutive laws, with a built-in decoupling mechanism that substantially reduces the search complexity of LLMs. At the lower level, we introduce a vision-guided constitutive evaluation mechanism, which utilizes visual simulation to evaluate the consistency between the generated constitutive law and the underlying intrinsic dynamics, thereby guiding the upper-level evolution. Experiments on both synthetic and real-world datasets demonstrate that VisionLaw can effectively infer interpretable intrinsic dynamics from visual observations. It significantly outperforms existing state-of-the-art methods and exhibits strong generalization for interactive simulation in novel scenarios.
CRAug 10, 2025
Fading the Digital Ink: A Universal Black-Box Attack Framework for 3DGS Watermarking SystemsQingyuan Zeng, Shu Jiang, Jiajing Lin et al.
With the rise of 3D Gaussian Splatting (3DGS), a variety of digital watermarking techniques, embedding either 1D bitstreams or 2D images, are used for copyright protection. However, the robustness of these watermarking techniques against potential attacks remains underexplored. This paper introduces the first universal black-box attack framework, the Group-based Multi-objective Evolutionary Attack (GMEA), designed to challenge these watermarking systems. We formulate the attack as a large-scale multi-objective optimization problem, balancing watermark removal with visual quality. In a black-box setting, we introduce an indirect objective function that blinds the watermark detector by minimizing the standard deviation of features extracted by a convolutional network, thus rendering the feature maps uninformative. To manage the vast search space of 3DGS models, we employ a group-based optimization strategy to partition the model into multiple, independent sub-optimization problems. Experiments demonstrate that our framework effectively removes both 1D and 2D watermarks from mainstream 3DGS watermarking methods while maintaining high visual fidelity. This work reveals critical vulnerabilities in existing 3DGS copyright protection schemes and calls for the development of more robust watermarking systems.
NEJun 3, 2025
CR-BLEA: Contrastive Ranking for Adaptive Resource Allocation in Bilevel Evolutionary AlgorithmsDejun Xu, Jijia Chen, Gary G. Yen et al.
Bilevel optimization poses a significant computational challenge due to its nested structure, where each upper-level candidate solution requires solving a corresponding lower-level problem. While evolutionary algorithms (EAs) are effective at navigating such complex landscapes, their high resource demands remain a key bottleneck -- particularly the redundant evaluation of numerous unpromising lower-level tasks. Despite recent advances in multitasking and transfer learning, resource waste persists. To address this issue, we propose a novel resource allocation framework for bilevel EAs that selectively identifies and focuses on promising lower-level tasks. Central to our approach is a contrastive ranking network that learns relational patterns between paired upper- and lower-level solutions online. This knowledge guides a reference-based ranking strategy that prioritizes tasks for optimization and adaptively controls resampling based on estimated population quality. Comprehensive experiments across five state-of-the-art bilevel algorithms show that our framework significantly reduces computational cost while preserving -- or even enhancing -- solution accuracy. This work offers a generalizable strategy to improve the efficiency of bilevel EAs, paving the way for more scalable bilevel optimization.
AIDec 20, 2024
Enhancing Large-scale UAV Route Planing with Global and Local Features via Reinforcement Graph FusionTao Zhou, Kai Ye, Zeyu Shi et al.
Numerous remarkable advancements have been made in accuracy, speed, and parallelism for solving the Unmanned Aerial Vehicle Route Planing (UAVRP). However, existing UAVRP solvers face challenges when attempting to scale effectively and efficiently for larger instances. In this paper, we present a generalization framework that enables current UAVRP solvers to robustly extend their capabilities to larger instances, accommodating up to 10,000 points, using widely recognized test sets. The UAVRP under a large number of patrol points is a typical large-scale TSP problem.Our proposed framework comprises three distinct steps. Firstly, we employ Delaunay triangulation to extract subgraphs from large instances while preserving global features. Secondly, we utilize an embedded TSP solver to obtain sub-results, followed by graph fusion. Finally, we implement a decoding strategy customizable to the user's requirements, resulting in high-quality solutions, complemented by a warming-up process for the heatmap. To demonstrate the flexibility of our approach, we integrate two representative TSP solvers into our framework and conduct a comprehensive comparative analysis against existing algorithms using large TSP benchmark datasets. The results unequivocally demonstrate that our framework efficiently scales existing TSP solvers to handle large instances and consistently outperforms state-of-the-art (SOTA) methods. Furthermore, since our proposed framework does not necessitate additional training or fine-tuning, we believe that its generality can significantly advance research on end-to-end UAVRP solvers, enabling the application of a broader range of methods to real-world scenarios.
CVNov 26, 2024
GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose EstimationXin Liu, Shibei Xue, Dezong Zhao et al.
6D object pose estimation is crucial for robotic perception and precise manipulation. Occlusion and incomplete object visibility are common challenges in this task, but existing pose refinement methods often struggle to handle these issues effectively. To tackle this problem, we propose a global motion-guided recurrent flow estimation method called GMFlow for pose estimation. GMFlow overcomes local ambiguities caused by occlusion or missing parts by seeking global explanations. We leverage the object's structural information to extend the motion of visible parts of the rigid body to its invisible regions. Specifically, we capture global contextual information through a linear attention mechanism and guide local motion information to generate global motion estimates. Furthermore, we introduce object shape constraints in the flow iteration process, making flow estimation suitable for pose estimation scenarios. Experiments on the LM-O and YCB-V datasets demonstrate that our method outperforms existing techniques in accuracy while maintaining competitive computational efficiency.
ROJan 26, 2022
A Cooperation-Aware Lane Change Method for Autonomous VehiclesZihao Sheng, Lin Liu, Shibei Xue et al.
Lane change for autonomous vehicles (AVs) is an important but challenging task in complex dynamic traffic environments. Due to difficulties in guarantee safety as well as a high efficiency, AVs are inclined to choose relatively conservative strategies for lane change. To avoid the conservatism, this paper presents a cooperation-aware lane change method utilizing interactions between vehicles. We first propose an interactive trajectory prediction method to explore possible cooperations between an AV and the others. Further, an evaluation is designed to make a decision on lane change, in which safety, efficiency and comfort are taken into consideration. Thereafter, we propose a motion planning algorithm based on model predictive control (MPC), which incorporates AV's decision and surrounding vehicles' interactive behaviors into constraints so as to avoid collisions during lane change. Quantitative testing results show that compared with the methods without an interactive prediction, our method enhances driving efficiencies of the AV and other vehicles by 14.8$\%$ and 2.6$\%$ respectively, which indicates that a proper utilization of vehicle interactions can effectively reduce the conservatism of the AV and promote the cooperation between the AV and others.
NEJul 16, 2021
Solving Large-Scale Multi-Objective Optimization via Probabilistic Prediction ModelHaokai Hong, Kai Ye, Min Jiang et al.
The main feature of large-scale multi-objective optimization problems (LSMOP) is to optimize multiple conflicting objectives while considering thousands of decision variables at the same time. An efficient LSMOP algorithm should have the ability to escape the local optimal solution from the huge search space and find the global optimal. Most of the current researches focus on how to deal with decision variables. However, due to the large number of decision variables, it is easy to lead to high computational cost. Maintaining the diversity of the population is one of the effective ways to improve search efficiency. In this paper, we propose a probabilistic prediction model based on trend prediction model and generating-filtering strategy, called LT-PPM, to tackle the LSMOP. The proposed method enhances the diversity of the population through importance sampling. At the same time, due to the adoption of an individual-based evolution mechanism, the computational cost of the proposed method is independent of the number of decision variables, thus avoiding the problem of exponential growth of the search space. We compared the proposed algorithm with several state-of-the-art algorithms for different benchmark functions. The experimental results and complexity analysis have demonstrated that the proposed algorithm has significant improvement in terms of its performance and computational efficiency in large-scale multi-objective optimization.
NEFeb 24, 2021
An Online Prediction Approach Based on Incremental Support Vector Machine for Dynamic Multiobjective OptimizationDejun Xu, Min Jiang, Weizhen Hu et al.
Real-world multiobjective optimization problems usually involve conflicting objectives that change over time, which requires the optimization algorithms to quickly track the Pareto optimal front (POF) when the environment changes. In recent years, evolutionary algorithms based on prediction models have been considered promising. However, most existing approaches only make predictions based on the linear correlation between a finite number of optimal solutions in two or three previous environments. These incomplete information extraction strategies may lead to low prediction accuracy in some instances. In this paper, a novel prediction algorithm based on incremental support vector machine (ISVM) is proposed, called ISVM-DMOEA. We treat the solving of dynamic multiobjective optimization problems (DMOPs) as an online learning process, using the continuously obtained optimal solution to update an incremental support vector machine without discarding the solution information at earlier time. ISVM is then used to filter random solutions and generate an initial population for the next moment. To overcome the obstacle of insufficient training samples, a synthetic minority oversampling strategy is implemented before the training of ISVM. The advantage of this approach is that the nonlinear correlation between solutions can be explored online by ISVM, and the information contained in all historical optimal solutions can be exploited to a greater extent. The experimental results and comparison with chosen state-of-the-art algorithms demonstrate that the proposed algorithm can effectively tackle dynamic multiobjective optimization problems.
NEJan 8, 2021
Manifold Interpolation for Large-Scale Multi-Objective Optimization via Generative Adversarial NetworksZhenzhong Wang, Haokai Hong, Kai Ye et al.
Large-scale multiobjective optimization problems (LSMOPs) are characterized as involving hundreds or even thousands of decision variables and multiple conflicting objectives. An excellent algorithm for solving LSMOPs should find Pareto-optimal solutions with diversity and escape from local optima in the large-scale search space. Previous research has shown that these optimal solutions are uniformly distributed on the manifold structure in the low-dimensional space. However, traditional evolutionary algorithms for solving LSMOPs have some deficiencies in dealing with this structural manifold, resulting in poor diversity, local optima, and inefficient searches. In this work, a generative adversarial network (GAN)-based manifold interpolation framework is proposed to learn the manifold and generate high-quality solutions on this manifold, thereby improving the performance of evolutionary algorithms. We compare the proposed algorithm with several state-of-the-art algorithms on large-scale multiobjective benchmark functions. Experimental results have demonstrated the significant improvements achieved by this framework in solving LSMOPs.
RODec 24, 2020
Evolutionary Gait Transfer of Multi-Legged Robots in Complex TerrainsMin Jiang, Guokun Chi, Geqiang Pan et al.
Robot gait optimization is the task of generating an optimal control trajectory under various internal and external constraints. Given the high dimensions of control space, this problem is particularly challenging for multi-legged robots walking in complex and unknown environments. Existing literatures often regard the gait generation as an optimization problem and solve the gait optimization from scratch for robots walking in a specific environment. However, such approaches do not consider the use of pre-acquired knowledge which can be useful in improving the quality and speed of motion generation in complex environments. To address the issue, this paper proposes a transfer learning-based evolutionary framework for multi-objective gait optimization, named Tr-GO. The idea is to initialize a high-quality population by using the technique of transfer learning, so any kind of population-based optimization algorithms can be seamlessly integrated into this framework. The advantage is that the generated gait can not only dynamically adapt to different environments and tasks, but also simultaneously satisfy multiple design specifications (e.g., speed, stability). The experimental results show the effectiveness of the proposed framework for the gait optimization problem based on three multi-objective evolutionary algorithms: NSGA-II, RM-MEDA and MOPSO. When transferring the pre-acquired knowledge from the plain terrain to various inclined and rugged ones, the proposed Tr-GO framework accelerates the evolution process by a minimum of 3-4 times compared with non-transferred scenarios.
LGOct 20, 2019
Online Bagging for Anytime Transfer LearningGuokun Chi, Min Jiang, Xing Gao et al.
Transfer learning techniques have been widely used in the reality that it is difficult to obtain sufficient labeled data in the target domain, but a large amount of auxiliary data can be obtained in the relevant source domain. But most of the existing methods are based on offline data. In practical applications, it is often necessary to face online learning problems in which the data samples are achieved sequentially. In this paper, We are committed to applying the ensemble approach to solving the problem of online transfer learning so that it can be used in anytime setting. More specifically, we propose a novel online transfer learning framework, which applies the idea of online bagging methods to anytime transfer learning problems, and constructs strong classifiers through online iterations of the usefulness of multiple weak classifiers. Further, our algorithm also provides two extension schemes to reduce the impact of negative transfer. Experiments on three real data sets show that the effectiveness of our proposed algorithms.
NEOct 19, 2019
Evolutionary Dynamic Multi-objective Optimization Via Regression Transfer LearningZhenzhong Wang, Min Jiang, Xing Gao et al.
Dynamic multi-objective optimization problems (DMOPs) remain a challenge to be settled, because of conflicting objective functions change over time. In recent years, transfer learning has been proven to be a kind of effective approach in solving DMOPs. In this paper, a novel transfer learning based dynamic multi-objective optimization algorithm (DMOA) is proposed called regression transfer learning prediction based DMOA (RTLP-DMOA). The algorithm aims to generate an excellent initial population to accelerate the evolutionary process and improve the evolutionary performance in solving DMOPs. When an environmental change is detected, a regression transfer learning prediction model is constructed by reusing the historical population, which can predict objective values. Then, with the assistance of this prediction model, some high-quality solutions with better predicted objective values are selected as the initial population, which can improve the performance of the evolutionary process. We compare the proposed algorithm with three state-of-the-art algorithms on benchmark functions. Experimental results indicate that the proposed algorithm can significantly enhance the performance of static multi-objective optimization algorithms and is competitive in convergence and diversity.
NEOct 19, 2019
Solving Dynamic Multi-objective Optimization Problems Using Incremental Support Vector MachineWeizhen Hu, Min Jiang, Xing Gao et al.
The main feature of the Dynamic Multi-objective Optimization Problems (DMOPs) is that optimization objective functions will change with times or environments. One of the promising approaches for solving the DMOPs is reusing the obtained Pareto optimal set (POS) to train prediction models via machine learning approaches. In this paper, we train an Incremental Support Vector Machine (ISVM) classifier with the past POS, and then the solutions of the DMOP we want to solve at the next moment are filtered through the trained ISVM classifier. A high-quality initial population will be generated by the ISVM classifier, and a variety of different types of population-based dynamic multi-objective optimization algorithms can benefit from the population. To verify this idea, we incorporate the proposed approach into three evolutionary algorithms, the multi-objective particle swarm optimization(MOPSO), Nondominated Sorting Genetic Algorithm II (NSGA-II), and the Regularity Model-based multi-objective estimation of distribution algorithm(RE-MEDA). We employ experiments to test these algorithms, and experimental results show the effectiveness.
AIOct 19, 2019
Solving dynamic multi-objective optimization problems via support vector machineMin Jiang, Weizhen Hu, Liming Qiu et al.
Dynamic Multi-objective Optimization Problems (DMOPs) refer to optimization problems that objective functions will change with time. Solving DMOPs implies that the Pareto Optimal Set (POS) at different moments can be accurately found, and this is a very difficult job due to the dynamics of the optimization problems. The POS that have been obtained in the past can help us to find the POS of the next time more quickly and accurately. Therefore, in this paper we present a Support Vector Machine (SVM) based Dynamic Multi-Objective Evolutionary optimization Algorithm, called SVM-DMOEA. The algorithm uses the POS that has been obtained to train a SVM and then take the trained SVM to classify the solutions of the dynamic optimization problem at the next moment, and thus it is able to generate an initial population which consists of different individuals recognized by the trained SVM. The initial populuation can be fed into any population based optimization algorithm, e.g., the Nondominated Sorting Genetic Algorithm II (NSGA-II), to get the POS at that moment. The experimental results show the validity of our proposed approach.
CVAug 8, 2017
Unconstrained Face Detection and Open-Set Face Recognition ChallengeManuel Günther, Peiyun Hu, Christian Herrmann et al.
Face detection and recognition benchmarks have shifted toward more difficult environments. The challenge presented in this paper addresses the next step in the direction of automatic detection and identification of people from outdoor surveillance cameras. While face detection has shown remarkable success in images collected from the web, surveillance cameras include more diverse occlusions, poses, weather conditions and image blur. Although face verification or closed-set face identification have surpassed human capabilities on some datasets, open-set identification is much more complex as it needs to reject both unknown identities and false accepts from the face detector. We show that unconstrained face detection can approach high detection rates albeit with moderate false accept rates. By contrast, open-set face recognition is currently weak and requires much more attention.
NEDec 19, 2016
Transfer Learning based Dynamic Multiobjective Optimization AlgorithmsMin Jiang, Zhongqiang Huang, Liming Qiu et al.
One of the major distinguishing features of the dynamic multiobjective optimization problems (DMOPs) is the optimization objectives will change over time, thus tracking the varying Pareto-optimal front becomes a challenge. One of the promising solutions is reusing the "experiences" to construct a prediction model via statistical machine learning approaches. However most of the existing methods ignore the non-independent and identically distributed nature of data used to construct the prediction model. In this paper, we propose an algorithmic framework, called Tr-DMOEA, which integrates transfer learning and population-based evolutionary algorithm for solving the DMOPs. This approach takes the transfer learning method as a tool to help reuse the past experience for speeding up the evolutionary process, and at the same time, any population based multiobjective algorithms can benefit from this integration without any extensive modifications. To verify this, we incorporate the proposed approach into the development of three well-known algorithms, nondominated sorting genetic algorithm II (NSGA-II), multiobjective particle swarm optimization (MOPSO), and the regularity model-based multiobjective estimation of distribution algorithm (RM-MEDA), and then employ twelve benchmark functions to test these algorithms as well as compare with some chosen state-of-the-art designs. The experimental results confirm the effectiveness of the proposed method through exploiting machine learning technology.