CVJul 16, 2024Code
REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image MatchingHan Nie, Bin Luo, Jun Liu et al.
We present REMM, a rotation-equivariant framework for end-to-end multimodal image matching, which fully encodes rotational differences of descriptors in the whole matching pipeline. Previous learning-based methods mainly focus on extracting modal-invariant descriptors, while consistently ignoring the rotational invariance. In this paper, we demonstrate that our REMM is very useful for multimodal image matching, including multimodal feature learning module and cyclic shift module. We first learn modal-invariant features through the multimodal feature learning module. Then, we design the cyclic shift module to rotationally encode the descriptors, greatly improving the performance of rotation-equivariant matching, which makes them robust to any angle. To validate our method, we establish a comprehensive rotation and scale-matching benchmark for evaluating the anti-rotation performance of multimodal images, which contains a combination of multi-angle and multi-scale transformations from four publicly available datasets. Extensive experiments show that our method outperforms existing methods in benchmarking and generalizes well to independent datasets. Additionally, we conducted an in-depth analysis of the key components of the REMM to validate the improvements brought about by the cyclic shift module. Code and dataset at https://github.com/HanNieWHU/REMM.
73.9ARMay 27
FT-Pilot: Automated Fault-Tolerant RTL Rewriting via Vulnerability-Guided LLMsWeixing Liu, Zizhen Liu, Jing Ye et al.
As integrated circuit technologies continue to scale toward advanced process nodes, the continual reduction in node capacitance and supply voltage has made digital systems increasingly vulnerable to soft errors. Although traditional full-chip hardening methods can improve reliability, they often incur unacceptable area and power overhead, making selective hardening a more practical engineering solution. However, existing approaches typically rely on time-consuming fault-injection simulation to determine hardening locations through vulnerability analysis, and still depend heavily on manual strategy selection and RTL modification during the hardening stage, making them ill-suited for efficient automated reliability optimization at early design stages. To address these challenges, this paper proposes FT-Pilot, a GNN-guided LLM framework for automatic RTL soft-error hardening. The framework first employs a GNN to identify critical vulnerable assets directly at the RTL level, and then introduces an LLM-driven rewriting engine composed of an analyzer and a rewriter, which performs RTL-level fault-tolerant code rewriting with the support of dual-knowledge-base retrieval-augmented generation and an automatic repair mechanism. Experimental results show that the proposed framework can automatically generate hardened RTL designs that are syntactically correct, functionally correct, and synthesizable across multiple benchmark circuits, while significantly reducing output error rates under soft-error scenarios. This work provides a practical automated path toward shift-left reliability optimization at the RTL level.
CVJan 31, 2024Code
Source-free Domain Adaptive Object Detection in Remote Sensing ImagesWeixing Liu, Jun Liu, Xin Su et al.
Recent studies have used unsupervised domain adaptive object detection (UDAOD) methods to bridge the domain gap in remote sensing (RS) images. However, UDAOD methods typically assume that the source domain data can be accessed during the domain adaptation process. This setting is often impractical in the real world due to RS data privacy and transmission difficulty. To address this challenge, we propose a practical source-free object detection (SFOD) setting for RS images, which aims to perform target domain adaptation using only the source pre-trained model. We propose a new SFOD method for RS images consisting of two parts: perturbed domain generation and alignment. The proposed multilevel perturbation constructs the perturbed domain in a simple yet efficient form by perturbing the domain-variant features at the image level and feature level according to the color and style bias. The proposed multilevel alignment calculates feature and label consistency between the perturbed domain and the target domain across the teacher-student network, and introduces the distillation of feature prototype to mitigate the noise of pseudo-labels. By requiring the detector to be consistent in the perturbed domain and the target domain, the detector is forced to focus on domaininvariant features. Extensive results of three synthetic-to-real experiments and three cross-sensor experiments have validated the effectiveness of our method which does not require access to source domain RS images. Furthermore, experiments on computer vision datasets show that our method can be extended to other fields as well. Our code will be available at: https://weixliu.github.io/ .
CVFeb 25, 2025Code
PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image MatchingHan Nie, Bin Luo, Jun Liu et al.
The ideal goal of image matching is to achieve stable and efficient performance in unseen domains. However, many existing learning-based optical-SAR image matching methods, despite their effectiveness in specific scenarios, exhibit limited generalization and struggle to adapt to practical applications. Repeatedly training or fine-tuning matching models to address domain differences is not only not elegant enough but also introduces additional computational overhead and data production costs. In recent years, general foundation models have shown great potential for enhancing generalization. However, the disparity in visual domains between natural and remote sensing images poses challenges for their direct application. Therefore, effectively leveraging foundation models to improve the generalization of optical-SAR image matching remains challenge. To address the above challenges, we propose PromptMID, a novel approach that constructs modality-invariant descriptors using text prompts based on land use classification as priors information for optical and SAR image matching. PromptMID extracts multi-scale modality-invariant features by leveraging pre-trained diffusion models and visual foundation models (VFMs), while specially designed feature aggregation modules effectively fuse features across different granularities. Extensive experiments on optical-SAR image datasets from four diverse regions demonstrate that PromptMID outperforms state-of-the-art matching methods, achieving superior results in both seen and unseen domains and exhibiting strong cross-domain generalization capabilities. The source code will be made publicly available https://github.com/HanNieWHU/PromptMID.
CVJun 9, 2020
Can Synthetic Data Improve Object Detection Results for Remote Sensing Images?Weixing Liu, Jun Liu, Bin Luo
Deep learning approaches require enough training samples to perform well, but it is a challenge to collect enough real training data and label them manually. In this letter, we propose the use of realistic synthetic data with a wide distribution to improve the performance of remote sensing image aircraft detection. Specifically, to increase the variability of synthetic data, we randomly set the parameters during rendering, such as the size of the instance and the class of background images. In order to make the synthetic images more realistic, we then refine the synthetic images at the pixel level using CycleGAN with real unlabeled images. We also fine-tune the model with a small amount of real data, to obtain a higher accuracy. Experiments on NWPU VHR-10, UCAS-AOD and DIOR datasets demonstrate that the proposed method can be applied for augmenting insufficient real data.