CVFeb 6Code
Unsupervised MR-US Multimodal Image Registration with Multilevel Correlation Pyramidal OptimizationJiazheng Wang, Zeyu Liu, Min Liu et al.
Surgical navigation based on multimodal image registration has played a significant role in providing intraoperative guidance to surgeons by showing the relative position of the target area to critical anatomical structures during surgery. However, due to the differences between multimodal images and intraoperative image deformation caused by tissue displacement and removal during the surgery, effective registration of preoperative and intraoperative multimodal images faces significant challenges. To address the multimodal image registration challenges in Learn2Reg 2025, an unsupervised multimodal medical image registration method based on Multilevel Correlation Pyramidal Optimization (MCPO) is designed to solve these problems. First, the features of each modality are extracted based on the modality independent neighborhood descriptor, and the multimodal images is mapped to the feature space. Second, a multilevel pyramidal fusion optimization mechanism is designed to achieve global optimization and local detail complementation of the displacement field through dense correlation analysis and weight-balanced coupled convex optimization for input features at different scales. Our method focuses on the ReMIND2Reg task in Learn2Reg 2025. Based on the results, our method achieved the first place in the validation phase and test phase of ReMIND2Reg. The MCPO is also validated on the Resect dataset, achieving an average TRE of 1.798 mm. This demonstrates the broad applicability of our method in preoperative-to-intraoperative image registration. The code is available at https://github.com/wjiazheng/MCPO.
CVJul 10, 2025
EPIC: Efficient Prompt Interaction for Text-Image ClassificationXinyao Yu, Hao Sun, Zeyu Ling et al.
In recent years, large-scale pre-trained multimodal models (LMMs) generally emerge to integrate the vision and language modalities, achieving considerable success in multimodal tasks, such as text-image classification. The growing size of LMMs, however, results in a significant computational cost for fine-tuning these models for downstream tasks. Hence, prompt-based interaction strategy is studied to align modalities more efficiently. In this context, we propose a novel efficient prompt-based multimodal interaction strategy, namely Efficient Prompt Interaction for text-image Classification (EPIC). Specifically, we utilize temporal prompts on intermediate layers, and integrate different modalities with similarity-based prompt interaction, to leverage sufficient information exchange between modalities. Utilizing this approach, our method achieves reduced computational resource consumption and fewer trainable parameters (about 1\% of the foundation model) compared to other fine-tuning strategies. Furthermore, it demonstrates superior performance on the UPMC-Food101 and SNLI-VE datasets, while achieving comparable performance on the MM-IMDB dataset.
LGNov 27, 2024
Physics-Informed Deep Learning Model for Line-integral Diagnostics Across Fusion DevicesCong Wang, Weizhe Yang, Haiping Wang et al.
Rapid reconstruction of 2D plasma profiles from line-integral measurements is important in nuclear fusion. This paper introduces a physics-informed model architecture called Onion, that can enhance the performance of models and be adapted to various backbone networks. The model under Onion incorporates physical information by a multiplication process and applies the physics-informed loss function according to the principle of line integration. Prediction results demonstrate that the additional input of physical information improves the deep learning model's ability, leading to a reduction in the average relative error E_1 between the reconstruction profiles and the target profiles by approximately 0.84x10^(-2) on synthetic datasets and about 0.06x10^(-2) on experimental datasets. Furthermore, the implementation of the Softplus activation function in the final two fully connected layers improves model performance. This enhancement results in a reduction in the E_1 by approximately 1.06x10^(-2) on synthetic datasets and about 0.11x10^(-2) on experimental datasets. The incorporation of the physics-informed loss function has been shown to correct the model's predictions, bringing the back-projections closer to the actual inputs and reducing the errors associated with inversion algorithms. Besides, we have developed a synthetic data model to generate customized line-integral diagnostic datasets and have also collected soft x-ray diagnostic datasets from EAST and HL-2A. This study achieves reductions in reconstruction errors, and accelerates the development of surrogate models in fusion research.
CVJan 26, 2024
Memory-Inspired Temporal Prompt Interaction for Text-Image ClassificationXinyao Yu, Hao Sun, Ziwei Niu et al.
In recent years, large-scale pre-trained multimodal models (LMM) generally emerge to integrate the vision and language modalities, achieving considerable success in various natural language processing and computer vision tasks. The growing size of LMMs, however, results in a significant computational cost for fine-tuning these models for downstream tasks. Hence, prompt-based interaction strategy is studied to align modalities more efficiently. In this contex, we propose a novel prompt-based multimodal interaction strategy inspired by human memory strategy, namely Memory-Inspired Temporal Prompt Interaction (MITP). Our proposed method involves in two stages as in human memory strategy: the acquiring stage, and the consolidation and activation stage. We utilize temporal prompts on intermediate layers to imitate the acquiring stage, leverage similarity-based prompt interaction to imitate memory consolidation, and employ prompt generation strategy to imitate memory activation. The main strength of our paper is that we interact the prompt vectors on intermediate layers to leverage sufficient information exchange between modalities, with compressed trainable parameters and memory usage. We achieve competitive results on several datasets with relatively small memory usage and 2.0M of trainable parameters (about 1% of the pre-trained foundation model).
CLSep 30, 2021
COVID-19 Fake News Detection Using Bidirectional Encoder Representations from Transformers Based ModelsYuxiang Wang, Yongheng Zhang, Xuebo Li et al.
Nowadays, the development of social media allows people to access the latest news easily. During the COVID-19 pandemic, it is important for people to access the news so that they can take corresponding protective measures. However, the fake news is flooding and is a serious issue especially under the global pandemic. The misleading fake news can cause significant loss in terms of the individuals and the society. COVID-19 fake news detection has become a novel and important task in the NLP field. However, fake news always contain the correct portion and the incorrect portion. This fact increases the difficulty of the classification task. In this paper, we fine tune the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model as our base model. We add BiLSTM layers and CNN layers on the top of the finetuned BERT model with frozen parameters or not frozen parameters methods respectively. The model performance evaluation results showcase that our best model (BERT finetuned model with frozen parameters plus BiLSTM layers) achieves state-of-the-art results towards COVID-19 fake news detection task. We also explore keywords evaluation methods using our best model and evaluate the model performance after removing keywords.
IRNov 1, 2020
Future-Aware Diverse Trends Framework for RecommendationYujie Lu, Shengyu Zhang, Yingxuan Huang et al.
In recommender systems, modeling user-item behaviors is essential for user representation learning. Existing sequential recommenders consider the sequential correlations between historically interacted items for capturing users' historical preferences. However, since users' preferences are by nature time-evolving and diversified, solely modeling the historical preference (without being aware of the time-evolving trends of preferences) can be inferior for recommending complementary or fresh items and thus hurt the effectiveness of recommender systems. In this paper, we bridge the gap between the past preference and potential future preference by proposing the future-aware diverse trends (FAT) framework. By future-aware, for each inspected user, we construct the future sequences from other similar users, which comprise of behaviors that happen after the last behavior of the inspected user, based on a proposed neighbor behavior extractor. By diverse trends, supposing the future preferences can be diversified, we propose the diverse trends extractor and the time-aware mechanism to represent the possible trends of preferences for a given user with multiple vectors. We leverage both the representations of historical preference and possible future trends to obtain the final recommendation. The quantitative and qualitative results from relatively extensive experiments on real-world datasets demonstrate the proposed framework not only outperforms the state-of-the-art sequential recommendation methods across various metrics, but also makes complementary and fresh recommendations.