Runze Xia

CL
h-index4
4papers
29citations
Novelty56%
AI Score41

4 Papers

CLJan 22
Hallucination Mitigating for Medical Report Generation

Ruoqing Zhao, Runze Xia, Piji Li

In the realm of medical report generation (MRG), the integration of natural language processing has emerged as a vital tool to alleviate the workload of radiologists. Despite the impressive capabilities demonstrated by large vision language models (LVLMs) in understanding natural language, their susceptibility to generating plausible yet inaccurate claims, known as ``hallucinations'', raises concerns-especially in the nuanced and critical field of medical. In this work, we introduce a framework, \textbf{K}nowledge-\textbf{E}nhanced with Fine-Grained \textbf{R}einforced Rewards \textbf{M}edical Report Generation (KERM), to tackle the issue. Our approach refines the input to the LVLM by first utilizing MedCLIP for knowledge retrieval, incorporating relevant lesion fact sentences from a curated knowledge corpus. We then introduce a novel purification module to ensure the retrieved knowledge is contextually relevant to the patient's clinical context. Subsequently, we employ fine-grained rewards to guide these models in generating highly supportive and clinically relevant descriptions, ensuring the alignment of model's outputs with desired behaviors. Experimental results on IU-Xray and MIMIC-CXR datasets validate the effectiveness of our approach in mitigating hallucinations and enhancing report quality.

CLSep 30, 2024
Decoding the Echoes of Vision from fMRI: Memory Disentangling for Past Semantic Information

Runze Xia, Congchi Yin, Piji Li

The human visual system is capable of processing continuous streams of visual information, but how the brain encodes and retrieves recent visual memories during continuous visual processing remains unexplored. This study investigates the capacity of working memory to retain past information under continuous visual stimuli. And then we propose a new task Memory Disentangling, which aims to extract and decode past information from fMRI signals. To address the issue of interference from past memory information, we design a disentangled contrastive learning method inspired by the phenomenon of proactive interference. This method separates the information between adjacent fMRI signals into current and past components and decodes them into image descriptions. Experimental results demonstrate that this method effectively disentangles the information within fMRI signals. This research could advance brain-computer interfaces and mitigate the problem of low temporal resolution in fMRI.

IROct 13, 2025
From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance

Runze Xia, Yupeng Ji, Yuxi Zhou et al.

Query-service relevance prediction in e-commerce search systems faces strict latency requirements that prevent the direct application of Large Language Models (LLMs). To bridge this gap, we propose a two-stage reasoning distillation framework to transfer reasoning capabilities from a powerful teacher LLM to a lightweight, deployment-friendly student model. In the first stage, we address the limitations of general-purpose LLMs by constructing a domain-adapted teacher model. This is achieved through a three-step process: domain-adaptive pre-training to inject platform knowledge, supervised fine-tuning to elicit reasoning skills, and preference optimization with a multi-dimensional reward model to ensure the generation of reliable and preference-aligned reasoning paths. This teacher can then automatically annotate massive query-service pairs from search logs with both relevance labels and reasoning chains. In the second stage, to address the challenges of architectural heterogeneity in standard distillation, we introduce Contrastive Reasoning Self-Distillation (CRSD). By modeling the behavior of the same student model under "standard" and "reasoning-augmented" inputs as a teacher-student relationship, CRSD enables the lightweight model to internalize the teacher's complex decision-making mechanisms without needing the explicit reasoning path at inference. Offline evaluations and online A/B testing in the Meituan search advertising system demonstrate that our framework achieves significant improvements across multiple metrics, validating its effectiveness and practical value.

CVMay 28, 2025
Improving Brain-to-Image Reconstruction via Fine-Grained Text Bridging

Runze Xia, Shuo Feng, Renzhi Wang et al.

Brain-to-Image reconstruction aims to recover visual stimuli perceived by humans from brain activity. However, the reconstructed visual stimuli often missing details and semantic inconsistencies, which may be attributed to insufficient semantic information. To address this issue, we propose an approach named Fine-grained Brain-to-Image reconstruction (FgB2I), which employs fine-grained text as bridge to improve image reconstruction. FgB2I comprises three key stages: detail enhancement, decoding fine-grained text descriptions, and text-bridged brain-to-image reconstruction. In the detail-enhancement stage, we leverage large vision-language models to generate fine-grained captions for visual stimuli and experimentally validate its importance. We propose three reward metrics (object accuracy, text-image semantic similarity, and image-image semantic similarity) to guide the language model in decoding fine-grained text descriptions from fMRI signals. The fine-grained text descriptions can be integrated into existing reconstruction methods to achieve fine-grained Brain-to-Image reconstruction.