CVMar 20Code
Unbiased Dynamic Multimodal FusionShicai Wei, Kaijie Zhang, Luyi Chen et al.
Traditional multimodal methods often assume static modality quality, which limits their adaptability in dynamic real-world scenarios. Thus, dynamical multimodal methods are proposed to assess modality quality and adjust their contribution accordingly. However, they typically rely on empirical metrics, failing to measure the modality quality when noise levels are extremely low or high. Moreover, existing methods usually assume that the initial contribution of each modality is the same, neglecting the intrinsic modality dependency bias. As a result, the modality hard to learn would be doubly penalized, and the performance of dynamical fusion could be inferior to that of static fusion. To address these challenges, we propose the Unbiased Dynamic Multimodal Learning (UDML) framework. Specifically, we introduce a noise-aware uncertainty estimator that adds controlled noise to the modality data and predicts its intensity from the modality feature. This forces the model to learn a clear correspondence between feature corruption and noise level, allowing accurate uncertainty measure across both low- and high-noise conditions. Furthermore, we quantify the inherent modality reliance bias within multimodal networks via modality dropout and incorporate it into the weighting mechanism. This eliminates the dual suppression effect on the hard-to-learn modality. Extensive experiments across diverse multimodal benchmark tasks validate the effectiveness, versatility, and generalizability of the proposed UDML. The code is available at https://github.com/shicaiwei123/UDML.
IRDec 25, 2025
A Knowledge Graph and Deep Learning-Based Semantic Recommendation Database System for Advertisement Retrieval and PersonalizationTangtang Wang, Kaijie Zhang, Kuangcong Liu
In modern digital marketing, the growing complexity of advertisement data demands intelligent systems capable of understanding semantic relationships among products, audiences, and advertising content. To address this challenge, this paper proposes a Knowledge Graph and Deep Learning-Based Semantic Recommendation Database System (KGSR-ADS) for advertisement retrieval and personalization. The proposed framework integrates a heterogeneous Ad-Knowledge Graph (Ad-KG) that captures multi-relational semantics, a Semantic Embedding Layer that leverages large language models (LLMs) such as GPT and LLaMA to generate context-aware vector representations, a GNN + Attention Model that infers cross-entity dependencies, and a Database Optimization & Retrieval Layer based on vector indexing (FAISS/Milvus) for efficient semantic search. This layered architecture enables both accurate semantic matching and scalable retrieval, allowing personalized ad recommendations under large-scale heterogeneous workloads.
CVApr 4
XSeg: A Large-scale X-ray Contraband Segmentation Benchmark For Real-World Security ScreeningHongxia Gao, Litao Li, Yixin Chen et al.
X-ray contraband detection is critical for public safety. However, current methods primarily rely on bounding box annotations, which limit model generalization and performance due to the lack of pixel-level supervision and real-world data. To address these limitations, we introduce XSeg. To the best of our knowledge, XSeg is the largest X-ray contraband segmentation dataset to date, including 98,644 images and 295,932 instance masks, and contains the latest 30 common contraband categories. The images are sourced from public datasets and our synthesized data, filtered through a custom data cleaning pipeline to remove low-quality samples. To enable accurate and efficient annotation and reduce manual labeling effort, we propose Adaptive Point SAM (APSAM), a specialized mask annotation model built upon the Segment Anything Model (SAM). We address SAM's poor cross-domain generalization and limited capability in detecting stacked objects by introducing an Energy-Aware Encoder that enhances the initialization of the mask decoder, significantly improving sensitivity to overlapping items. Additionally, we design an Adaptive Point Generator that allows users to obtain precise mask labels with only a single coarse point prompt. Extensive experiments on XSeg demonstrate the superior performance of APSAM.
CLSep 5, 2025
Research on Multi-hop Inference Optimization of LLM Based on MQUAKE FrameworkZucheng Liang, Wenxin Wei, Kaijie Zhang et al.
Accurately answering complex questions has consistently been a significant challenge for Large Language Models (LLMs). To address this, this paper proposes a multi-hop question decomposition method for complex questions, building upon research within the MQUAKE framework. Utilizing the LLAMA3 model, we systematically investigate the impact of multi-hop question decomposition within knowledge graphs on model comprehension and reasoning accuracy, both before and after model training. In our experiments, we systematically partitioned and converted the MQUAKE-T dataset into two distinct formats: a single-hop dataset designed for directly answering complex questions, and a multi-hop dataset constructed using the multi-hop question decomposition method. We then fine-tuned the LLAMA3 model on these datasets and conducted inference tests. Our results demonstrate that, without fine-tuning the LLM, the prediction performance based on the multi-hop question decomposition method significantly outperforms the method of directly answering complex questions. After fine-tuning using the LoRA (Low-Rank Adaptation) method, the performance of both approaches improved compared to the untrained baseline. Crucially, the method utilizing multi-hop decomposition consistently maintained its superiority. These findings validate the effectiveness of the multi-hop decomposition method both before and after training, demonstrating its capability to effectively enhance the LLM's ability to answer complex questions.