LGSep 1, 2022Code
Physics-informed MTA-UNet: Prediction of Thermal Stress and Thermal Deformation of SatellitesZeyu Cao, Wen Yao, Wei Peng et al.
The rapid analysis of thermal stress and deformation plays a pivotal role in the thermal control measures and optimization of the structural design of satellites. For achieving real-time thermal stress and thermal deformation analysis of satellite motherboards, this paper proposes a novel Multi-Task Attention UNet (MTA-UNet) neural network which combines the advantages of both Multi-Task Learning (MTL) and U-Net with attention mechanism. Besides, a physics-informed strategy is used in the training process, where partial differential equations (PDEs) are integrated into the loss functions as residual terms. Finally, an uncertainty-based loss balancing approach is applied to weight different loss functions of multiple training tasks. Experimental results show that the proposed MTA-UNet effectively improves the prediction accuracy of multiple physics tasks compared with Single-Task Learning (STL) models. In addition, the physics-informed method brings less error in the prediction of each task, especially on small data sets. The code can be downloaded at: \url{https://github.com/KomorebiTso/MTA-UNet}.
LGMay 2, 2022
RANG: A Residual-based Adaptive Node Generation Method for Physics-Informed Neural NetworksWei Peng, Weien Zhou, Xiaoya Zhang et al.
Learning solutions of partial differential equations (PDEs) with Physics-Informed Neural Networks (PINNs) is an attractive alternative approach to traditional solvers due to its flexibility and ease of incorporating observed data. Despite the success of PINNs in accurately solving a wide variety of PDEs, the method still requires improvements in terms of computational efficiency. One possible improvement idea is to optimize the generation of training point sets. Residual-based adaptive sampling and quasi-uniform sampling approaches have been each applied to improve the training effects of PINNs, respectively. To benefit from both methods, we propose the Residual-based Adaptive Node Generation (RANG) approach for efficient training of PINNs, which is based on a variable density nodal distribution method for RBF-FD. The method is also enhanced by a memory mechanism to further improve training stability. We conduct experiments on three linear PDEs and three nonlinear PDEs with various node generation methods, through which the accuracy and efficiency of the proposed method compared to the predominant uniform sampling approach is verified numerically.
LGMar 15, 2022
A physics and data co-driven surrogate modeling approach for temperature field prediction on irregular geometric domainKairui Bao, Wen Yao, Xiaoya Zhang et al.
In the whole aircraft structural optimization loop, thermal analysis plays a very important role. But it faces a severe computational burden when directly applying traditional numerical analysis tools, especially when each optimization involves repetitive parameter modification and thermal analysis followed. Recently, with the fast development of deep learning, several Convolutional Neural Network (CNN) surrogate models have been introduced to overcome this obstacle. However, for temperature field prediction on irregular geometric domains (TFP-IGD), CNN can hardly be competent since most of them stem from processing for regular images. To alleviate this difficulty, we propose a novel physics and data co-driven surrogate modeling method. First, after adapting the Bezier curve in geometric parameterization, a body-fitted coordinate mapping is introduced to generate coordinate transforms between the irregular physical plane and regular computational plane. Second, a physics-driven CNN surrogate with partial differential equation (PDE) residuals as a loss function is utilized for fast meshing (meshing surrogate); then, we present a data-driven surrogate model based on the multi-level reduced-order method, aiming to learn solutions of temperature field in the above regular computational plane (thermal surrogate). Finally, combining the grid position information provided by the meshing surrogate with the scalar temperature field information provided by the thermal surrogate (combined model), we reach an end-to-end surrogate model from geometric parameters to temperature field prediction on an irregular geometric domain. Numerical results demonstrate that our method can significantly improve accuracy prediction on a smaller dataset while reducing the training time when compared with other CNN methods.
CVAug 17, 2023
Edit Temporal-Consistent Videos with Image Diffusion ModelYuanzhi Wang, Yong Li, Xiaoya Zhang et al.
Large-scale text-to-image (T2I) diffusion models have been extended for text-guided video editing, yielding impressive zero-shot video editing performance. Nonetheless, the generated videos usually show spatial irregularities and temporal inconsistencies as the temporal characteristics of videos have not been faithfully modeled. In this paper, we propose an elegant yet effective Temporal-Consistent Video Editing (TCVE) method to mitigate the temporal inconsistency challenge for robust text-guided video editing. In addition to the utilization of a pretrained T2I 2D Unet for spatial content manipulation, we establish a dedicated temporal Unet architecture to faithfully capture the temporal coherence of the input video sequences. Furthermore, to establish coherence and interrelation between the spatial-focused and temporal-focused components, a cohesive spatial-temporal modeling unit is formulated. This unit effectively interconnects the temporal Unet with the pretrained 2D Unet, thereby enhancing the temporal consistency of the generated videos while preserving the capacity for video content manipulation. Quantitative experimental results and visualization results demonstrate that TCVE achieves state-of-the-art performance in both video temporal consistency and video editing capability, surpassing existing benchmarks in the field.
LGOct 19, 2022
Robust Regression with Highly Corrupted Data via Physics Informed Neural NetworksWei Peng, Wen Yao, Weien Zhou et al.
Physics-informed neural networks (PINNs) have been proposed to solve two main classes of problems: data-driven solutions and data-driven discovery of partial differential equations. This task becomes prohibitive when such data is highly corrupted due to the possible sensor mechanism failing. We propose the Least Absolute Deviation based PINN (LAD-PINN) to reconstruct the solution and recover unknown parameters in PDEs - even if spurious data or outliers corrupt a large percentage of the observations. To further improve the accuracy of recovering hidden physics, the two-stage Median Absolute Deviation based PINN (MAD-PINN) is proposed, where LAD-PINN is employed as an outlier detector followed by MAD screening out the highly corrupted data. Then the vanilla PINN or its variants can be subsequently applied to exploit the remaining normal data. Through several examples, including Poisson's equation, wave equation, and steady or unsteady Navier-Stokes equations, we illustrate the generalizability, accuracy and efficiency of the proposed algorithms for recovering governing equations from noisy and highly corrupted measurement data.
CVApr 11, 2023
Benchmarking the Physical-world Adversarial Robustness of Vehicle DetectionTianyuan Zhang, Yisong Xiao, Xiaoya Zhang et al.
Adversarial attacks in the physical world can harm the robustness of detection models. Evaluating the robustness of detection models in the physical world can be challenging due to the time-consuming and labor-intensive nature of many experiments. Thus, virtual simulation experiments can provide a solution to this challenge. However, there is no unified detection benchmark based on virtual simulation environment. To address this challenge, we proposed an instant-level data generation pipeline based on the CARLA simulator. Using this pipeline, we generated the DCI dataset and conducted extensive experiments on three detection models and three physical adversarial attacks. The dataset covers 7 continuous and 1 discrete scenes, with over 40 angles, 20 distances, and 20,000 positions. The results indicate that Yolo v6 had strongest resistance, with only a 6.59% average AP drop, and ASA was the most effective attack algorithm with a 14.51% average AP reduction, twice that of other algorithms. Static scenes had higher recognition AP, and results under different weather conditions were similar. Adversarial attack algorithm improvement may be approaching its 'limitation'.
LGMar 29, 2022
Consistency regularization-based Deep Polynomial Chaos Neural Network Method for Reliability AnalysisXiaohu Zheng, Wen Yao, Yunyang Zhang et al.
Polynomial chaos expansion (PCE) is a powerful surrogate model-based reliability analysis method. Generally, a PCE model with a higher expansion order is usually required to obtain an accurate surrogate model for some complex non-linear stochastic systems. However, the high-order PCE increases the number of labeled data required for solving the expansion coefficients. To alleviate this problem, this paper proposes a consistency regularization-based deep polynomial chaos neural network (Deep PCNN) method, including the low-order adaptive PCE model (the auxiliary model) and the high-order polynomial chaos neural network (the main model). The expansion coefficients of the main model are parameterized into the learnable weights of the polynomial chaos neural network, realizing iterative learning of expansion coefficients to obtain more accurate high-order PCE models. The auxiliary model uses a proposed consistency regularization loss function to assist in training the main model. The consistency regularization-based Deep PCNN method can significantly reduce the number of labeled data in constructing a high-order PCE model without losing accuracy by using few labeled data and abundant unlabeled data. A numerical example validates the effectiveness of the consistency regularization-based Deep PCNN method, and then this method is applied to analyze the reliability of two aerospace engineering systems.
IRApr 23, 2025Code
MMHCL: Multi-Modal Hypergraph Contrastive Learning for RecommendationXu Guo, Tong Zhang, Fuyun Wang et al.
The burgeoning presence of multimodal content-sharing platforms propels the development of personalized recommender systems. Previous works usually suffer from data sparsity and cold-start problems, and may fail to adequately explore semantic user-product associations from multimodal data. To address these issues, we propose a novel Multi-Modal Hypergraph Contrastive Learning (MMHCL) framework for user recommendation. For a comprehensive information exploration from user-product relations, we construct two hypergraphs, i.e. a user-to-user (u2u) hypergraph and an item-to-item (i2i) hypergraph, to mine shared preferences among users and intricate multimodal semantic resemblance among items, respectively. This process yields denser second-order semantics that are fused with first-order user-item interaction as complementary to alleviate the data sparsity issue. Then, we design a contrastive feature enhancement paradigm by applying synergistic contrastive learning. By maximizing/minimizing the mutual information between second-order (e.g. shared preference pattern for users) and first-order (information of selected items for users) embeddings of the same/different users and items, the feature distinguishability can be effectively enhanced. Compared with using sparse primary user-item interaction only, our MMHCL obtains denser second-order hypergraphs and excavates more abundant shared attributes to explore the user-product associations, which to a certain extent alleviates the problems of data sparsity and cold-start. Extensive experiments have comprehensively demonstrated the effectiveness of our method. Our code is publicly available at: https://github.com/Xu107/MMHCL.
CVDec 16, 2024
Re-Attentional Controllable Video Diffusion EditingYuanzhi Wang, Yong Li, Mengyi Liu et al.
Editing videos with textual guidance has garnered popularity due to its streamlined process which mandates users to solely edit the text prompt corresponding to the source video. Recent studies have explored and exploited large-scale text-to-image diffusion models for text-guided video editing, resulting in remarkable video editing capabilities. However, they may still suffer from some limitations such as mislocated objects, incorrect number of objects. Therefore, the controllability of video editing remains a formidable challenge. In this paper, we aim to challenge the above limitations by proposing a Re-Attentional Controllable Video Diffusion Editing (ReAtCo) method. Specially, to align the spatial placement of the target objects with the edited text prompt in a training-free manner, we propose a Re-Attentional Diffusion (RAD) to refocus the cross-attention activation responses between the edited text prompt and the target video during the denoising stage, resulting in a spatially location-aligned and semantically high-fidelity manipulated video. In particular, to faithfully preserve the invariant region content with less border artifacts, we propose an Invariant Region-guided Joint Sampling (IRJS) strategy to mitigate the intrinsic sampling errors w.r.t the invariant regions at each denoising timestep and constrain the generated content to be harmonized with the invariant region content. Experimental results verify that ReAtCo consistently improves the controllability of video diffusion editing and achieves superior video editing performance.
CVJun 8, 2025
Technical Report for ICRA 2025 GOOSE 3D Semantic Segmentation Challenge: Adaptive Point Cloud Understanding for Heterogeneous Robotic SystemsXiaoya Zhang
This technical report presents the implementation details of the winning solution for the ICRA 2025 GOOSE 3D Semantic Segmentation Challenge. This challenge focuses on semantic segmentation of 3D point clouds from diverse unstructured outdoor environments collected from multiple robotic platforms. This problem was addressed by implementing Point Prompt Tuning (PPT) integrated with Point Transformer v3 (PTv3) backbone, enabling adaptive processing of heterogeneous LiDAR data through platform-specific conditioning and cross-dataset class alignment strategies. The model is trained without requiring additional external data. As a result, this approach achieved substantial performance improvements with mIoU increases of up to 22.59% on challenging platforms compared to the baseline PTv3 model, demonstrating the effectiveness of adaptive point cloud understanding for field robotics applications.
ROJun 24, 2025
Robotics Under Construction: Challenges on Job SitesHaruki Uchiito, Akhilesh Bhat, Koji Kusaka et al.
As labor shortages and productivity stagnation increasingly challenge the construction industry, automation has become essential for sustainable infrastructure development. This paper presents an autonomous payload transportation system as an initial step toward fully unmanned construction sites. Our system, based on the CD110R-3 crawler carrier, integrates autonomous navigation, fleet management, and GNSS-based localization to facilitate material transport in construction site environments. While the current system does not yet incorporate dynamic environment adaptation algorithms, we have begun fundamental investigations into external-sensor based perception and mapping system. Preliminary results highlight the potential challenges, including navigation in evolving terrain, environmental perception under construction-specific conditions, and sensor placement optimization for improving autonomy and efficiency. Looking forward, we envision a construction ecosystem where collaborative autonomous agents dynamically adapt to site conditions, optimizing workflow and reducing human intervention. This paper provides foundational insights into the future of robotics-driven construction automation and identifies critical areas for further technological development.
IRApr 13, 2025
Multi-Modal Hypergraph Enhanced LLM Learning for RecommendationXu Guo, Tong Zhang, Yuanzhi Wang et al.
The burgeoning presence of Large Language Models (LLM) is propelling the development of personalized recommender systems. Most existing LLM-based methods fail to sufficiently explore the multi-view graph structure correlations inherent in recommendation scenarios. To this end, we propose a novel framework, Hypergraph Enhanced LLM Learning for multimodal Recommendation (HeLLM), designed to equip LLMs with the capability to capture intricate higher-order semantic correlations by fusing graph-level contextual signals with sequence-level behavioral patterns. In the recommender pre-training phase, we design a user hypergraph to uncover shared interest preferences among users and an item hypergraph to capture correlations within multimodal similarities among items. The hypergraph convolution and synergistic contrastive learning mechanism are introduced to enhance the distinguishability of learned representations. In the LLM fine-tuning phase, we inject the learned graph-structured embeddings directly into the LLM's architecture and integrate sequential features capturing each user's chronological behavior. This process enables hypergraphs to leverage graph-structured information as global context, enhancing the LLM's ability to perceive complex relational patterns and integrate multimodal information, while also modeling local temporal dynamics. Extensive experiments demonstrate the superiority of our proposed method over state-of-the-art baselines, confirming the advantages of fusing hypergraph-based context with sequential user behavior in LLMs for recommendation.
LGFeb 14, 2022
Physics-Informed Deep Monte Carlo Quantile Regression method for Interval Multilevel Bayesian Network-based Satellite Heat Reliability AnalysisXiaohu Zheng, Wen Yao, Zhiqiang Gong et al.
Temperature field reconstruction is essential for analyzing satellite heat reliability. As a representative machine learning model, the deep convolutional neural network (DCNN) is a powerful tool for reconstructing the satellite temperature field. However, DCNN needs a lot of labeled data to learn its parameters, which is contrary to the fact that actual satellite engineering can only acquire noisy unlabeled data. To solve the above problem, this paper proposes an unsupervised method, i.e., the physics-informed deep Monte Carlo quantile regression method, for reconstructing temperature field and quantifying the aleatoric uncertainty caused by data noise. For one thing, the proposed method combines a deep convolutional neural network with the known physics knowledge to reconstruct an accurate temperature field using only monitoring point temperatures. For another thing, the proposed method can quantify the aleatoric uncertainty by the Monte Carlo quantile regression. Based on the reconstructed temperature field and the quantified aleatoric uncertainty, this paper models an interval multilevel Bayesian Network to analyze satellite heat reliability. Two case studies are used to validate the proposed method.
CVSep 15, 2021
FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial AttackDonghua Wang, Tingsong Jiang, Jialiang Sun et al.
Physical adversarial attacks in object detection have attracted increasing attention. However, most previous works focus on hiding the objects from the detector by generating an individual adversarial patch, which only covers the planar part of the vehicle's surface and fails to attack the detector in physical scenarios for multi-view, long-distance and partially occluded objects. To bridge the gap between digital attacks and physical attacks, we exploit the full 3D vehicle surface to propose a robust Full-coverage Camouflage Attack (FCA) to fool detectors. Specifically, we first try rendering the nonplanar camouflage texture over the full vehicle surface. To mimic the real-world environment conditions, we then introduce a transformation function to transfer the rendered camouflaged vehicle into a photo realistic scenario. Finally, we design an efficient loss function to optimize the camouflage texture. Experiments show that the full-coverage camouflage attack can not only outperform state-of-the-art methods under various test cases but also generalize to different environments, vehicles, and object detectors. The code of FCA will be available at: https://idrl-lab.github.io/Full-coverage-camouflage-adversarial-attack/.
LGJul 23, 2021
A novel meta-learning initialization method for physics-informed neural networksXu Liu, Xiaoya Zhang, Wei Peng et al.
Physics-informed neural networks (PINNs) have been widely used to solve various scientific computing problems. However, large training costs limit PINNs for some real-time applications. Although some works have been proposed to improve the training efficiency of PINNs, few consider the influence of initialization. To this end, we propose a New Reptile initialization based Physics-Informed Neural Network (NRPINN). The original Reptile algorithm is a meta-learning initialization method based on labeled data. PINNs can be trained with less labeled data or even without any labeled data by adding partial differential equations (PDEs) as a penalty term into the loss function. Inspired by this idea, we propose the new Reptile initialization to sample more tasks from the parameterized PDEs and adapt the penalty term of the loss. The new Reptile initialization can acquire initialization parameters from related tasks by supervised, unsupervised, and semi-supervised learning. Then, PINNs with initialization parameters can efficiently solve PDEs. Besides, the new Reptile initialization can also be used for the variants of PINNs. Finally, we demonstrate and verify the NRPINN considering both forward problems, including solving Poisson, Burgers, and Schrödinger equations, as well as inverse problems, where unknown parameters in the PDEs are estimated. Experimental results show that the NRPINN training is much faster and achieves higher accuracy than PINNs with other initialization methods.